# ADR-011: Swarm Coordination (agentic-flow Integration) ## Status Accepted (Implemented) ## Date 2026-01-27 ## Context Clawdbot has basic async processing. RuvBot integrates agentic-flow patterns for: - Multi-agent swarm coordination - 12 specialized background workers - Byzantine fault-tolerant consensus - Dynamic topology switching ## Decision ### Swarm Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ RuvBot Swarm Coordination │ ├─────────────────────────────────────────────────────────────────┤ │ Topologies │ │ ├─ hierarchical : Queen-worker (anti-drift) │ │ ├─ mesh : Peer-to-peer network │ │ ├─ hierarchical-mesh : Hybrid for scalability │ │ └─ adaptive : Dynamic switching │ ├─────────────────────────────────────────────────────────────────┤ │ Consensus Protocols │ │ ├─ byzantine : BFT (f < n/3 faulty) │ │ ├─ raft : Leader-based (f < n/2) │ │ ├─ gossip : Eventually consistent │ │ └─ crdt : Conflict-free replication │ ├─────────────────────────────────────────────────────────────────┤ │ Background Workers (12) │ │ ├─ ultralearn [normal] : Deep knowledge acquisition │ │ ├─ optimize [high] : Performance optimization │ │ ├─ consolidate [low] : Memory consolidation (EWC++) │ │ ├─ predict [normal] : Predictive preloading │ │ ├─ audit [critical] : Security analysis │ │ ├─ map [normal] : Codebase mapping │ │ ├─ preload [low] : Resource preloading │ │ ├─ deepdive [normal] : Deep code analysis │ │ ├─ document [normal] : Auto-documentation │ │ ├─ refactor [normal] : Refactoring suggestions │ │ ├─ benchmark [normal] : Performance benchmarking │ │ └─ testgaps [normal] : Test coverage analysis │ └─────────────────────────────────────────────────────────────────┘ ``` ### Implementation Located in `/npm/packages/ruvbot/src/swarm/`: - `SwarmCoordinator.ts` - Main coordinator with task dispatch - `ByzantineConsensus.ts` - PBFT-style consensus implementation ### SwarmCoordinator ```typescript interface SwarmConfig { topology: SwarmTopology; // 'hierarchical' | 'mesh' | 'hierarchical-mesh' | 'adaptive' maxAgents: number; // default: 8 strategy: 'specialized' | 'balanced' | 'adaptive'; consensus: ConsensusProtocol; // 'byzantine' | 'raft' | 'gossip' | 'crdt' heartbeatInterval?: number; // default: 5000ms taskTimeout?: number; // default: 60000ms } interface SwarmTask { id: string; worker: WorkerType; type: string; content: unknown; priority: WorkerPriority; status: 'pending' | 'running' | 'completed' | 'failed'; assignedAgent?: string; result?: unknown; error?: string; createdAt: Date; startedAt?: Date; completedAt?: Date; } interface SwarmAgent { id: string; type: WorkerType; status: 'idle' | 'busy' | 'offline'; currentTask?: string; completedTasks: number; failedTasks: number; lastHeartbeat: Date; } ``` ### Worker Configuration ```typescript const WORKER_DEFAULTS: Record = { ultralearn: { priority: 'normal', concurrency: 2, timeout: 60000, retries: 3, backoff: 'exponential' }, optimize: { priority: 'high', concurrency: 4, timeout: 30000, retries: 2, backoff: 'exponential' }, consolidate: { priority: 'low', concurrency: 1, timeout: 120000, retries: 1, backoff: 'linear' }, predict: { priority: 'normal', concurrency: 2, timeout: 15000, retries: 2, backoff: 'exponential' }, audit: { priority: 'critical', concurrency: 1, timeout: 45000, retries: 3, backoff: 'exponential' }, map: { priority: 'normal', concurrency: 2, timeout: 60000, retries: 2, backoff: 'linear' }, preload: { priority: 'low', concurrency: 4, timeout: 10000, retries: 1, backoff: 'linear' }, deepdive: { priority: 'normal', concurrency: 2, timeout: 90000, retries: 2, backoff: 'exponential' }, document: { priority: 'normal', concurrency: 2, timeout: 30000, retries: 2, backoff: 'linear' }, refactor: { priority: 'normal', concurrency: 2, timeout: 60000, retries: 2, backoff: 'exponential' }, benchmark: { priority: 'normal', concurrency: 1, timeout: 120000, retries: 1, backoff: 'linear' }, testgaps: { priority: 'normal', concurrency: 2, timeout: 45000, retries: 2, backoff: 'linear' }, }; ``` ### ByzantineConsensus (PBFT) ```typescript interface ConsensusConfig { replicas: number; // Total number of replicas (default: 5) timeout: number; // Timeout per phase (default: 30000ms) retries: number; // Retries before failing (default: 3) requireSignatures: boolean; } // Fault tolerance: f < n/3 // Quorum size: ceil(2n/3) ``` **Phases:** 1. `pre-prepare` - Leader broadcasts proposal 2. `prepare` - Replicas validate and send prepare messages 3. `commit` - Wait for quorum of commit messages 4. `decided` - Consensus reached 5. `failed` - Consensus failed (timeout/Byzantine fault) ### Usage Example ```typescript import { SwarmCoordinator, ByzantineConsensus } from './swarm'; // Initialize swarm const swarm = new SwarmCoordinator({ topology: 'hierarchical', maxAgents: 8, strategy: 'specialized', consensus: 'raft' }); await swarm.start(); // Spawn specialized agents await swarm.spawnAgent('ultralearn'); await swarm.spawnAgent('optimize'); // Dispatch task const task = await swarm.dispatch({ worker: 'ultralearn', task: { type: 'deep-analysis', content: 'analyze this' }, priority: 'normal' }); // Wait for completion const result = await swarm.waitForTask(task.id); // Byzantine consensus for critical decisions const consensus = new ByzantineConsensus({ replicas: 5, timeout: 30000 }); consensus.initializeReplicas(['node1', 'node2', 'node3', 'node4', 'node5']); const decision = await consensus.propose({ action: 'deploy', version: '1.0.0' }); ``` ### Events SwarmCoordinator emits: - `started`, `stopped` - `agent:spawned`, `agent:removed`, `agent:offline` - `task:created`, `task:assigned`, `task:completed`, `task:failed` ByzantineConsensus emits: - `proposal:created` - `phase:pre-prepare`, `phase:prepare`, `phase:commit` - `vote:received` - `consensus:decided`, `consensus:failed`, `consensus:no-quorum` - `replica:faulty`, `view:changed` ## Consequences ### Positive - Distributed task execution with priority queues - Fault tolerance via PBFT consensus - Specialized workers for different task types - Heartbeat-based health monitoring - Event-driven architecture ### Negative - Coordination overhead - Complexity of distributed systems - Memory overhead for task/agent tracking ### RuvBot Advantages over Clawdbot - 12 specialized workers vs basic async - Byzantine fault tolerance vs none - Multi-topology support vs single-threaded - Learning workers (ultralearn, consolidate) vs static - Priority-based task scheduling