Files
wifi-densepose/vendor/ruvector/crates/ruvector-postgres/src/routing

Tiny Dancer Routing Module

Neural-powered dynamic agent routing with FastGRNN for intelligent AI agent selection.

Overview

The Tiny Dancer routing module provides intelligent routing of requests to AI agents based on multiple optimization criteria including cost, latency, quality, and balanced performance. It uses a FastGRNN (Fast Gated Recurrent Neural Network) for adaptive decision-making.

Architecture

Components

  1. FastGRNN (fastgrnn.rs)

    • Lightweight gated recurrent neural network
    • Real-time routing decisions with minimal compute
    • Adaptive learning from routing patterns
  2. Agent Registry (agents.rs)

    • Thread-safe agent storage with DashMap
    • Capability-based agent discovery
    • Performance metrics tracking
  3. Router (router.rs)

    • Multi-objective optimization
    • Constraint-based filtering
    • Neural-enhanced confidence scoring
  4. PostgreSQL Operators (operators.rs)

    • SQL functions for agent management
    • Routing query interface
    • Statistics and monitoring

PostgreSQL Functions

Agent Registration

-- Register a simple agent
SELECT ruvector_register_agent(
    'gpt-4',                    -- Agent name
    'llm',                      -- Agent type
    ARRAY['code_generation', 'reasoning'],  -- Capabilities
    0.03,                       -- Cost per request ($)
    500.0,                      -- Average latency (ms)
    0.95                        -- Quality score (0-1)
);

-- Register with full configuration
SELECT ruvector_register_agent_full('{
    "name": "claude-3-opus",
    "agent_type": "llm",
    "capabilities": ["coding", "reasoning", "writing"],
    "cost_model": {
        "per_request": 0.025,
        "per_token": 0.00005
    },
    "performance": {
        "avg_latency_ms": 400.0,
        "quality_score": 0.93,
        "success_rate": 0.99,
        "p95_latency_ms": 600.0,
        "p99_latency_ms": 1000.0
    },
    "is_active": true
}'::jsonb);

Routing Requests

-- Basic routing (optimize for balanced performance)
SELECT ruvector_route(
    embedding_vector,           -- Request embedding (384-dim)
    'balanced',                 -- Optimization target
    NULL                        -- No constraints
)
FROM requests
WHERE id = 123;

-- Cost-optimized routing with constraints
SELECT ruvector_route(
    embedding_vector,
    'cost',
    '{"max_latency_ms": 1000.0, "min_quality": 0.8}'::jsonb
)
FROM requests
WHERE id = 456;

-- Quality-optimized with capability requirements
SELECT ruvector_route(
    embedding_vector,
    'quality',
    '{
        "max_cost": 0.1,
        "required_capabilities": ["code_generation", "debugging"],
        "excluded_agents": ["slow-agent"]
    }'::jsonb
);

-- Latency-optimized routing
SELECT ruvector_route(
    embedding_vector,
    'latency',
    '{"max_latency_ms": 500.0}'::jsonb
);

Agent Management

-- List all agents
SELECT * FROM ruvector_list_agents();

-- Get specific agent details
SELECT ruvector_get_agent('gpt-4');

-- Find agents by capability
SELECT * FROM ruvector_find_agents_by_capability('code_generation', 5);

-- Update agent performance metrics
SELECT ruvector_update_agent_metrics(
    'gpt-4',                    -- Agent name
    450.0,                      -- Observed latency (ms)
    true,                       -- Success
    0.92                        -- Quality score (optional)
);

-- Deactivate an agent
SELECT ruvector_set_agent_active('gpt-4', false);

-- Remove an agent
SELECT ruvector_remove_agent('old-agent');

-- Get routing statistics
SELECT ruvector_routing_stats();

Usage Examples

Example 1: Multi-Model Routing System

-- Register various AI models
SELECT ruvector_register_agent('gpt-4', 'llm',
    ARRAY['coding', 'reasoning', 'math'], 0.03, 500.0, 0.95);
SELECT ruvector_register_agent('gpt-3.5-turbo', 'llm',
    ARRAY['general', 'fast'], 0.002, 200.0, 0.75);
SELECT ruvector_register_agent('claude-3-opus', 'llm',
    ARRAY['coding', 'writing', 'analysis'], 0.025, 400.0, 0.93);
SELECT ruvector_register_agent('llama-2-70b', 'llm',
    ARRAY['local', 'private'], 0.0, 800.0, 0.72);

-- Create routing view
CREATE VIEW intelligent_routing AS
SELECT
    r.id,
    r.query_text,
    r.embedding,
    route.agent_name,
    route.confidence,
    route.estimated_cost,
    route.estimated_latency_ms,
    route.expected_quality,
    route.reasoning
FROM requests r,
LATERAL (
    SELECT (ruvector_route(
        r.embedding,
        'balanced',
        NULL
    ))::jsonb AS route_data
) route_query,
LATERAL jsonb_to_record(route_query.route_data) AS route(
    agent_name text,
    confidence float4,
    estimated_cost float4,
    estimated_latency_ms float4,
    expected_quality float4,
    similarity_score float4,
    reasoning text
);

-- Query with automatic routing
SELECT * FROM intelligent_routing WHERE id = 123;

Example 2: Cost-Aware Batch Processing

-- Process batch with cost constraints
CREATE TEMP TABLE batch_results AS
SELECT
    r.id,
    r.query_text,
    routing.agent_name,
    routing.estimated_cost,
    routing.expected_quality
FROM requests r
CROSS JOIN LATERAL (
    SELECT (ruvector_route(
        r.embedding,
        'cost',
        '{"max_cost": 0.01, "min_quality": 0.7}'::jsonb
    ))::jsonb->'agent_name' AS agent_name,
    (ruvector_route(
        r.embedding,
        'cost',
        '{"max_cost": 0.01, "min_quality": 0.7}'::jsonb
    ))::jsonb->'estimated_cost' AS estimated_cost,
    (ruvector_route(
        r.embedding,
        'cost',
        '{"max_cost": 0.01, "min_quality": 0.7}'::jsonb
    ))::jsonb->'expected_quality' AS expected_quality
) routing
WHERE r.processed = false
LIMIT 1000;

-- Calculate total estimated cost
SELECT
    SUM((estimated_cost)::float) AS total_cost,
    AVG((expected_quality)::float) AS avg_quality,
    COUNT(*) AS total_requests
FROM batch_results;

Example 3: Quality-First Routing

-- Route critical requests to highest quality agents
CREATE FUNCTION route_critical_request(
    request_embedding float4[],
    min_quality float4 DEFAULT 0.9
) RETURNS jsonb AS $$
    SELECT ruvector_route(
        request_embedding,
        'quality',
        jsonb_build_object(
            'min_quality', min_quality,
            'max_latency_ms', 2000.0,
            'required_capabilities', ARRAY['reasoning', 'analysis']
        )
    );
$$ LANGUAGE SQL;

-- Use the function
SELECT route_critical_request(embedding_vector, 0.95)
FROM critical_requests
WHERE priority = 'high';

Example 4: Real-time Performance Tracking

-- Update metrics after each request
CREATE FUNCTION record_agent_performance(
    agent_name text,
    actual_latency_ms float4,
    success boolean,
    quality_score float4
) RETURNS void AS $$
BEGIN
    PERFORM ruvector_update_agent_metrics(
        agent_name,
        actual_latency_ms,
        success,
        quality_score
    );
END;
$$ LANGUAGE plpgsql;

-- Trigger to auto-update metrics
CREATE TRIGGER update_agent_metrics_trigger
AFTER INSERT ON request_completions
FOR EACH ROW
EXECUTE FUNCTION record_agent_performance(
    NEW.agent_name,
    NEW.latency_ms,
    NEW.success,
    NEW.quality_score
);

Example 5: Capability-Based Routing

-- Create specialized routing functions
CREATE FUNCTION route_code_request(emb float4[]) RETURNS text AS $$
    SELECT (ruvector_route(
        emb,
        'quality',
        '{"required_capabilities": ["coding", "debugging"]}'::jsonb
    ))::jsonb->>'agent_name';
$$ LANGUAGE SQL;

CREATE FUNCTION route_writing_request(emb float4[]) RETURNS text AS $$
    SELECT (ruvector_route(
        emb,
        'quality',
        '{"required_capabilities": ["writing", "editing"]}'::jsonb
    ))::jsonb->>'agent_name';
$$ LANGUAGE SQL;

-- Use in application logic
SELECT
    CASE
        WHEN task_type = 'code' THEN route_code_request(embedding)
        WHEN task_type = 'write' THEN route_writing_request(embedding)
        ELSE (ruvector_route(embedding, 'balanced', NULL))::jsonb->>'agent_name'
    END AS selected_agent
FROM tasks;

Optimization Targets

Cost

  • Minimizes cost per request
  • Considers both per-request and per-token costs
  • Ideal for high-volume, cost-sensitive workloads

Latency

  • Minimizes response time
  • Uses average latency metrics
  • Best for real-time applications

Quality

  • Maximizes quality score
  • Based on historical performance
  • Recommended for critical tasks

Balanced

  • Multi-objective optimization
  • Balances cost, latency, quality, and similarity
  • Default for general-purpose routing

Constraints

max_cost

Maximum acceptable cost per request (in dollars)

max_latency_ms

Maximum acceptable latency in milliseconds

min_quality

Minimum required quality score (0-1 scale)

required_capabilities

Array of required agent capabilities

excluded_agents

Array of agent names to exclude from selection

Performance Considerations

  1. Agent Registry: Thread-safe with DashMap for concurrent access
  2. Embedding Similarity: Uses fast cosine similarity for request matching
  3. FastGRNN: Lightweight neural network for real-time inference
  4. Caching: Consider caching routing decisions for identical requests

Monitoring

-- View agent statistics
SELECT name, total_requests, avg_latency_ms, quality_score, success_rate
FROM ruvector_list_agents()
ORDER BY total_requests DESC;

-- Get overall routing statistics
SELECT ruvector_routing_stats();

-- Find underperforming agents
SELECT name, success_rate, quality_score
FROM ruvector_list_agents()
WHERE success_rate < 0.95
   OR quality_score < 0.7;

Best Practices

  1. Register Accurate Metrics: Keep agent performance metrics up-to-date
  2. Use Constraints: Always set appropriate constraints for production
  3. Monitor Performance: Track actual vs. estimated metrics
  4. Update Regularly: Use ruvector_update_agent_metrics after each request
  5. Capability Matching: Ensure agents have accurate capability tags
  6. Cost Tracking: Monitor total routing costs with statistics queries

Integration with Other Modules

The routing module integrates seamlessly with:

  • Vector Search: Use query embeddings for semantic routing
  • GNN: Enhance routing with graph neural networks
  • Quantization: Reduce embedding storage costs
  • HNSW Index: Fast similarity search for agent selection

Future Enhancements

  • A/B testing framework for agent comparison
  • Multi-armed bandit algorithms for exploration
  • Reinforcement learning for adaptive routing
  • Cost prediction models
  • Load balancing across agent instances
  • Geo-distributed agent routing