git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
23 KiB
23 KiB
ADR-002: Multi-tenancy Design
Status: Accepted Date: 2026-01-27 Decision Makers: RuVector Architecture Team Technical Area: Security, Data Architecture
Context and Problem Statement
RuvBot must serve multiple organizations (tenants) and users within each organization while maintaining strict data isolation. A breach of tenant boundaries would:
- Violate privacy and compliance requirements (GDPR, SOC2, HIPAA)
- Expose sensitive business information
- Destroy trust in the platform
- Create legal liability
The multi-tenancy design must address:
- Data Isolation: No cross-tenant data access
- Authentication: Identity verification at multiple levels
- Authorization: Fine-grained permission control
- Resource Limits: Fair usage and cost allocation
- Audit Trails: Complete visibility into access patterns
Decision Drivers
Security Requirements
| Requirement | Criticality | Description |
|---|---|---|
| Zero cross-tenant leakage | Critical | No tenant can access another tenant's data |
| Row-level security | Critical | Database enforces isolation, not just application |
| Token-based auth | High | Stateless, revocable authentication |
| RBAC + ABAC | High | Role and attribute-based access control |
| Audit logging | High | All data access logged with tenant context |
Operational Requirements
| Requirement | Target | Description |
|---|---|---|
| Tenant provisioning | < 30s | New tenant setup time |
| User provisioning | < 5s | New user creation time |
| Quota enforcement | Real-time | Immediate limit enforcement |
| Data export | < 1h for 1GB | GDPR data portability |
| Data deletion | < 24h | GDPR right to erasure |
Decision Outcome
Adopt Hierarchical Multi-tenancy with RLS and JWT Claims
We implement a three-level hierarchy with PostgreSQL Row-Level Security (RLS) as the primary isolation mechanism.
+---------------------------+
| ORGANIZATION | Billing entity, security boundary
|---------------------------|
| id: UUID |
| name: string |
| plan: Plan |
| settings: OrgSettings |
| quotas: ResourceQuotas |
+-------------+-------------+
|
| 1:N
v
+---------------------------+
| WORKSPACE | Project/team boundary
|---------------------------|
| id: UUID |
| orgId: UUID (FK) |
| name: string |
| settings: WorkspaceSettings|
+-------------+-------------+
|
| 1:N
v
+---------------------------+
| USER | Individual identity
|---------------------------|
| id: UUID |
| workspaceId: UUID (FK) |
| email: string |
| roles: Role[] |
| preferences: Preferences |
+---------------------------+
Tenant Isolation Layers
Layer 1: Network Isolation
Internet
|
v
+---+---+
| WAF | Rate limiting, DDoS protection
+---+---+
|
v
+---+---+
| LB/TLS| TLS termination, tenant routing
+---+---+
|
+--------+--------+--------+
| | | |
+---v---+ +---v---+ +---v---+ +---v---+
| Org A | | Org B | | Org C | | Org D | Virtual host routing
+-------+ +-------+ +-------+ +-------+
Layer 2: Authentication & Authorization
// JWT token structure with tenant claims
interface RuvBotToken {
// Standard claims
sub: string; // User ID
iat: number; // Issued at
exp: number; // Expiration
// Tenant claims (always present)
org_id: string; // Organization ID
workspace_id: string; // Workspace ID
// Permission claims
roles: Role[]; // User roles
permissions: string[];// Explicit permissions
// Resource claims
quotas: {
sessions: number;
messages_per_day: number;
memory_mb: number;
};
}
// Role hierarchy
enum Role {
ORG_OWNER = 'org:owner',
ORG_ADMIN = 'org:admin',
WORKSPACE_ADMIN = 'workspace:admin',
MEMBER = 'member',
VIEWER = 'viewer',
API_KEY = 'api_key',
}
// Permission matrix
const PERMISSIONS: Record<Role, string[]> = {
'org:owner': ['*'],
'org:admin': ['org:read', 'org:write', 'workspace:*', 'user:*', 'billing:read'],
'workspace:admin': ['workspace:read', 'workspace:write', 'user:read', 'user:invite'],
'member': ['session:*', 'memory:read', 'memory:write', 'skill:execute'],
'viewer': ['session:read', 'memory:read'],
'api_key': ['session:create', 'session:read'],
};
Layer 3: Database Row-Level Security
-- Enable RLS on all tenant-scoped tables
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
ALTER TABLE memories ENABLE ROW LEVEL SECURITY;
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE skills ENABLE ROW LEVEL SECURITY;
ALTER TABLE trajectories ENABLE ROW LEVEL SECURITY;
-- Create tenant context function
CREATE OR REPLACE FUNCTION current_tenant_id()
RETURNS UUID AS $$
BEGIN
RETURN current_setting('app.current_org_id', true)::UUID;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE OR REPLACE FUNCTION current_workspace_id()
RETURNS UUID AS $$
BEGIN
RETURN current_setting('app.current_workspace_id', true)::UUID;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
-- RLS policies for conversations
CREATE POLICY conversations_isolation ON conversations
FOR ALL
USING (org_id = current_tenant_id())
WITH CHECK (org_id = current_tenant_id());
-- RLS policies for memories (workspace-level)
CREATE POLICY memories_isolation ON memories
FOR ALL
USING (
org_id = current_tenant_id()
AND workspace_id = current_workspace_id()
);
-- Read-only policy for cross-workspace memory sharing
CREATE POLICY memories_shared_read ON memories
FOR SELECT
USING (
org_id = current_tenant_id()
AND is_shared = true
);
Layer 4: Vector Store Isolation
// Namespace isolation in RuVector
interface VectorNamespace {
// Namespace format: {org_id}/{workspace_id}/{collection}
// Example: "550e8400-e29b/.../episodic"
encode(orgId: string, workspaceId: string, collection: string): string;
decode(namespace: string): { orgId: string; workspaceId: string; collection: string };
validate(namespace: string, token: RuvBotToken): boolean;
}
// Vector store with tenant isolation
class TenantIsolatedVectorStore {
constructor(
private store: RuVectorAdapter,
private tenantContext: TenantContext
) {}
async search(query: Float32Array, options: SearchOptions): Promise<SearchResult[]> {
const namespace = this.getNamespace(options.collection);
// Validate namespace matches token claims
if (!this.validateNamespace(namespace)) {
throw new TenantIsolationError('Namespace mismatch');
}
return this.store.search(query, { ...options, namespace });
}
private getNamespace(collection: string): string {
return `${this.tenantContext.orgId}/${this.tenantContext.workspaceId}/${collection}`;
}
private validateNamespace(namespace: string): boolean {
const { orgId, workspaceId } = VectorNamespace.decode(namespace);
return (
orgId === this.tenantContext.orgId &&
workspaceId === this.tenantContext.workspaceId
);
}
}
Data Partitioning Strategy
PostgreSQL Partitioning
-- Partition conversations by org_id for isolation and performance
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL,
workspace_id UUID NOT NULL,
session_id UUID NOT NULL,
user_id UUID NOT NULL,
content TEXT NOT NULL,
role VARCHAR(20) NOT NULL,
embedding_id UUID,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY LIST (org_id);
-- Create partition per organization
CREATE OR REPLACE FUNCTION create_org_partition(org_id UUID)
RETURNS void AS $$
DECLARE
partition_name TEXT;
BEGIN
partition_name := 'conversations_' || replace(org_id::text, '-', '_');
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF conversations FOR VALUES IN (%L)',
partition_name,
org_id
);
END;
$$ LANGUAGE plpgsql;
-- Indexes per partition
CREATE INDEX CONCURRENTLY conversations_session_idx
ON conversations (session_id, created_at DESC);
CREATE INDEX CONCURRENTLY conversations_user_idx
ON conversations (user_id, created_at DESC);
CREATE INDEX CONCURRENTLY conversations_embedding_idx
ON conversations (embedding_id) WHERE embedding_id IS NOT NULL;
Vector Store Partitioning
// HNSW index per tenant for isolation and independent scaling
interface TenantVectorIndex {
orgId: string;
workspaceId: string;
collection: 'episodic' | 'semantic' | 'skills';
// Index configuration (can vary per tenant plan)
config: {
dimensions: number; // 384 for MiniLM, 1536 for larger models
m: number; // HNSW connections (16-32)
efConstruction: number; // Build quality (100-200)
efSearch: number; // Query quality (50-100)
};
// Usage metrics
metrics: {
vectorCount: number;
memoryUsageMB: number;
avgSearchLatencyMs: number;
lastOptimized: Date;
};
}
// Index lifecycle management
class TenantIndexManager {
async provisionTenant(orgId: string): Promise<void> {
// Create default workspaces indices
await this.createIndex(orgId, 'default', 'episodic');
await this.createIndex(orgId, 'default', 'semantic');
await this.createIndex(orgId, 'default', 'skills');
}
async deleteTenant(orgId: string): Promise<void> {
// Delete all indices for org (GDPR deletion)
const indices = await this.listIndices(orgId);
await Promise.all(indices.map(idx => this.deleteIndex(idx.id)));
// Log deletion for audit
await this.auditLog.record({
action: 'tenant_deletion',
orgId,
indexCount: indices.length,
timestamp: new Date(),
});
}
async optimizeIndex(indexId: string): Promise<OptimizationResult> {
// Background optimization with tenant resource limits
const index = await this.getIndex(indexId);
const quota = await this.getQuota(index.orgId);
if (index.metrics.memoryUsageMB > quota.maxVectorMemoryMB) {
// Apply quantization to reduce memory
return this.compressIndex(indexId, 'product_quantization');
}
return this.rebalanceIndex(indexId);
}
}
Authentication Flows
OAuth2/OIDC Flow
+--------+ +--------+
| User | | IdP |
+---+----+ +---+----+
| |
| 1. Login request |
+--------------------------------------->|
| |
| 2. Redirect to IdP |
|<---------------------------------------+
| |
| 3. Authenticate + consent |
+--------------------------------------->|
| |
| 4. Auth code redirect |
|<---------------------------------------+
| |
| +--------+ |
| 5. Auth code | RuvBot | |
+------------------>| Auth | |
| +---+----+ |
| | |
| 6. Exchange code | |
| +--------------->|
| | |
| 7. ID + Access token | |
| |<---------------+
| | |
| 8. Create session, |
| issue RuvBot JWT |
|<----------------------+
| |
| 9. Authenticated |
+<----------------------+
API Key Authentication
// API key structure
interface APIKey {
id: string;
keyHash: string; // SHA-256 hash of actual key
prefix: string; // First 8 chars for identification
orgId: string;
workspaceId: string;
name: string;
permissions: string[];
rateLimit: RateLimitConfig;
expiresAt: Date | null;
lastUsedAt: Date | null;
createdBy: string;
createdAt: Date;
}
// API key validation middleware
async function validateAPIKey(req: Request): Promise<TenantContext> {
const authHeader = req.headers.authorization;
if (!authHeader?.startsWith('Bearer ')) {
throw new AuthenticationError('Missing authorization header');
}
const key = authHeader.slice(7);
const prefix = key.slice(0, 8);
const keyHash = crypto.createHash('sha256').update(key).digest('hex');
// Lookup by prefix, then verify hash (timing-safe)
const apiKey = await db.apiKeys.findByPrefix(prefix);
if (!apiKey || !crypto.timingSafeEqual(
Buffer.from(apiKey.keyHash),
Buffer.from(keyHash)
)) {
throw new AuthenticationError('Invalid API key');
}
// Check expiration
if (apiKey.expiresAt && apiKey.expiresAt < new Date()) {
throw new AuthenticationError('API key expired');
}
// Update last used (async, don't block)
db.apiKeys.updateLastUsed(apiKey.id).catch(console.error);
return {
orgId: apiKey.orgId,
workspaceId: apiKey.workspaceId,
userId: apiKey.createdBy,
roles: [Role.API_KEY],
permissions: apiKey.permissions,
};
}
Resource Quotas and Rate Limiting
Quota Configuration
// Plan-based quota tiers
interface ResourceQuotas {
// Session limits
maxConcurrentSessions: number;
maxSessionDurationMinutes: number;
maxTurnsPerSession: number;
// Memory limits
maxMemoriesPerWorkspace: number;
maxVectorStorageMB: number;
maxEmbeddingsPerDay: number;
// Compute limits
maxLLMTokensPerDay: number;
maxSkillExecutionsPerDay: number;
maxBackgroundJobsPerHour: number;
// Rate limits
requestsPerMinute: number;
requestsPerHour: number;
burstLimit: number;
}
const PLAN_QUOTAS: Record<Plan, ResourceQuotas> = {
free: {
maxConcurrentSessions: 2,
maxSessionDurationMinutes: 30,
maxTurnsPerSession: 50,
maxMemoriesPerWorkspace: 1000,
maxVectorStorageMB: 50,
maxEmbeddingsPerDay: 500,
maxLLMTokensPerDay: 10000,
maxSkillExecutionsPerDay: 100,
maxBackgroundJobsPerHour: 10,
requestsPerMinute: 20,
requestsPerHour: 500,
burstLimit: 5,
},
pro: {
maxConcurrentSessions: 10,
maxSessionDurationMinutes: 120,
maxTurnsPerSession: 500,
maxMemoriesPerWorkspace: 50000,
maxVectorStorageMB: 1000,
maxEmbeddingsPerDay: 10000,
maxLLMTokensPerDay: 500000,
maxSkillExecutionsPerDay: 5000,
maxBackgroundJobsPerHour: 200,
requestsPerMinute: 100,
requestsPerHour: 5000,
burstLimit: 20,
},
enterprise: {
maxConcurrentSessions: -1, // Unlimited
maxSessionDurationMinutes: -1,
maxTurnsPerSession: -1,
maxMemoriesPerWorkspace: -1,
maxVectorStorageMB: -1,
maxEmbeddingsPerDay: -1,
maxLLMTokensPerDay: -1,
maxSkillExecutionsPerDay: -1,
maxBackgroundJobsPerHour: -1,
requestsPerMinute: 500,
requestsPerHour: 20000,
burstLimit: 50,
},
};
Rate Limiter Implementation
// Token bucket rate limiter with Redis backend
class TenantRateLimiter {
constructor(private redis: Redis) {}
async checkLimit(
tenantId: string,
action: string,
config: RateLimitConfig
): Promise<RateLimitResult> {
const key = `ratelimit:${tenantId}:${action}`;
const now = Date.now();
const windowMs = config.windowMs || 60000;
// Lua script for atomic rate limit check
const result = await this.redis.eval(`
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local burst = tonumber(ARGV[4])
-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests
local count = redis.call('ZCARD', key)
-- Check burst limit (recent 1s)
local burstCount = redis.call('ZCOUNT', key, now - 1000, now)
if burstCount >= burst then
return {0, limit - count, burst - burstCount, now + 1000}
end
if count >= limit then
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retryAfter = oldest[2] + window - now
return {0, 0, burst - burstCount, retryAfter}
end
-- Add current request
redis.call('ZADD', key, now, now .. ':' .. math.random())
redis.call('PEXPIRE', key, window)
return {1, limit - count - 1, burst - burstCount - 1, 0}
`, 1, key, now, windowMs, config.limit, config.burstLimit);
const [allowed, remaining, burstRemaining, retryAfter] = result as number[];
return {
allowed: allowed === 1,
remaining,
burstRemaining,
retryAfter: retryAfter > 0 ? Math.ceil(retryAfter / 1000) : 0,
limit: config.limit,
};
}
}
Audit Logging
// Comprehensive audit trail
interface AuditEvent {
id: string;
timestamp: Date;
// Tenant context
orgId: string;
workspaceId: string;
userId: string;
// Event details
action: AuditAction;
resource: AuditResource;
resourceId: string;
// Request context
requestId: string;
ipAddress: string;
userAgent: string;
// Change tracking
before?: Record<string, unknown>;
after?: Record<string, unknown>;
// Outcome
status: 'success' | 'failure' | 'denied';
errorCode?: string;
errorMessage?: string;
}
type AuditAction =
| 'create' | 'read' | 'update' | 'delete'
| 'login' | 'logout' | 'token_refresh'
| 'export' | 'import'
| 'share' | 'unshare'
| 'invite' | 'remove'
| 'skill_execute' | 'memory_recall'
| 'quota_exceeded' | 'rate_limited';
type AuditResource =
| 'user' | 'session' | 'conversation'
| 'memory' | 'skill' | 'agent'
| 'workspace' | 'organization'
| 'api_key' | 'webhook';
// Audit logger with async persistence
class AuditLogger {
private buffer: AuditEvent[] = [];
private flushInterval: NodeJS.Timeout;
constructor(
private storage: AuditStorage,
private config: { batchSize: number; flushMs: number }
) {
this.flushInterval = setInterval(() => this.flush(), config.flushMs);
}
async log(event: Omit<AuditEvent, 'id' | 'timestamp'>): Promise<void> {
const fullEvent: AuditEvent = {
...event,
id: crypto.randomUUID(),
timestamp: new Date(),
};
this.buffer.push(fullEvent);
if (this.buffer.length >= this.config.batchSize) {
await this.flush();
}
}
private async flush(): Promise<void> {
if (this.buffer.length === 0) return;
const events = this.buffer.splice(0, this.buffer.length);
await this.storage.batchInsert(events);
}
async query(filter: AuditFilter): Promise<AuditEvent[]> {
// Ensure tenant isolation in queries
if (!filter.orgId) {
throw new Error('orgId required for audit queries');
}
return this.storage.query(filter);
}
}
GDPR Compliance
Data Export
// Personal data export for GDPR Article 15
class DataExporter {
async exportUserData(
orgId: string,
userId: string
): Promise<DataExportResult> {
const export = {
metadata: {
userId,
orgId,
exportedAt: new Date(),
format: 'json',
version: '1.0',
},
data: {} as Record<string, unknown>,
};
// Collect all user data across contexts
const [
profile,
sessions,
conversations,
memories,
preferences,
auditLogs,
] = await Promise.all([
this.exportProfile(userId),
this.exportSessions(userId),
this.exportConversations(userId),
this.exportMemories(userId),
this.exportPreferences(userId),
this.exportAuditLogs(userId),
]);
export.data = {
profile,
sessions,
conversations,
memories,
preferences,
auditLogs,
};
// Generate downloadable archive
const archivePath = await this.createArchive(export);
// Log export for audit
await this.auditLogger.log({
orgId,
workspaceId: '*',
userId,
action: 'export',
resource: 'user',
resourceId: userId,
status: 'success',
});
return {
downloadUrl: await this.generateSignedUrl(archivePath),
expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24h
sizeBytes: await this.getFileSize(archivePath),
};
}
}
Data Deletion
// Right to erasure (GDPR Article 17)
class DataDeleter {
async deleteUserData(
orgId: string,
userId: string,
options: DeletionOptions = {}
): Promise<DeletionResult> {
const jobId = crypto.randomUUID();
// Start deletion job (may take time for large datasets)
await this.jobQueue.enqueue('data-deletion', {
jobId,
orgId,
userId,
options,
});
return {
jobId,
status: 'pending',
estimatedCompletionTime: await this.estimateCompletionTime(userId),
};
}
async executeDeletion(job: DeletionJob): Promise<void> {
const { orgId, userId, options } = job.data;
// Order matters: delete dependent data first
const steps = [
{ name: 'sessions', fn: () => this.deleteSessions(userId) },
{ name: 'conversations', fn: () => this.deleteConversations(userId) },
{ name: 'memories', fn: () => this.deleteMemories(userId, options.preserveShared) },
{ name: 'embeddings', fn: () => this.deleteEmbeddings(userId) },
{ name: 'trajectories', fn: () => this.deleteTrajectories(userId) },
{ name: 'preferences', fn: () => this.deletePreferences(userId) },
{ name: 'audit_logs', fn: () => this.anonymizeAuditLogs(userId) }, // Anonymize, not delete
{ name: 'profile', fn: () => this.deleteProfile(userId) },
];
for (const step of steps) {
try {
const result = await step.fn();
await this.updateProgress(job.id, step.name, 'completed', result);
} catch (error) {
await this.updateProgress(job.id, step.name, 'failed', error);
throw error; // Fail job, require manual intervention
}
}
// Final audit entry (anonymized user reference)
await this.auditLogger.log({
orgId,
workspaceId: '*',
userId: 'DELETED_USER',
action: 'delete',
resource: 'user',
resourceId: userId.slice(0, 8) + '...',
status: 'success',
});
}
}
Consequences
Benefits
- Strong Isolation: RLS + namespace isolation prevents cross-tenant access
- Compliance Ready: GDPR, SOC2, HIPAA requirements addressed
- Scalable Quotas: Per-tenant resource limits enable fair usage
- Audit Trail: Complete visibility for security and compliance
- Flexible Auth: OAuth2 + API keys support various use cases
Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| RLS bypass via SQL injection | Low | Critical | Parameterized queries, ORM only |
| Token theft | Medium | High | Short expiry, refresh rotation |
| Quota gaming (multiple accounts) | Medium | Medium | Device fingerprinting, email verification |
| Audit log tampering | Low | High | Append-only storage, checksums |
Related Decisions
- ADR-001: Architecture Overview
- ADR-003: Persistence Layer (RLS implementation details)
Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-01-27 | RuVector Architecture Team | Initial version |