# Neuro-Symbolic Query Execution - Implementation Plan

## Overview

### Problem Statement

Current vector search in ruvector is purely neural (similarity-based): given a query vector, find the k most similar vectors by cosine/Euclidean distance. However, real-world queries often involve **logical constraints** that pure vector similarity cannot express:

**Examples of Unsupported Queries:**
- "Find vectors similar to X **AND** published after 2023 **AND** tagged as 'research'"
- "Find vectors similar to X **OR** similar to Y, **EXCLUDING** category 'spam'"
- "Find vectors where `metadata.price < 100` **AND** similarity > 0.8"
- "Find vectors in graph community C **AND** within 2 hops of node N"

**Current Limitations:**
- No support for boolean logic (AND, OR, NOT)
- Cannot filter by metadata attributes
- Cannot combine vector similarity with graph structure
- Forces post-processing filtering (inefficient)
- No way to express complex multi-modal queries

**Performance Impact:**
- Retrieving 10,000 vectors then filtering to 10 wastes 99.9% of computation
- No index acceleration for metadata predicates
- Cannot push down filters to HNSW search

### Proposed Solution

**Neuro-Symbolic Query Execution**: A hybrid query engine that combines neural vector similarity with symbolic logical constraints.

**Key Components:**

1. **Query Language**: Extend existing Cypher/SQL support with vector similarity operators
2. **Hybrid Scoring**: Combine vector similarity scores with predicate satisfaction
3. **Filter Pushdown**: Apply logical constraints during HNSW search (not after)
4. **Multi-Modal Indexing**: Index metadata attributes alongside vectors
5. **Constraint Propagation**: Use graph structure to prune search space

**Architecture:**
```
Query: "MATCH (v:Vector) WHERE vector_similarity(v.embedding, $query) > 0.8
        AND v.year >= 2023 AND v.category IN ['research', 'papers']
        RETURN v ORDER BY similarity DESC LIMIT 10"

      ↓ Parse & Optimize

Neural Component:        Symbolic Component:
vector_similarity > 0.8  year >= 2023 AND category IN [...]
      ↓                        ↓
  HNSW Search            Metadata Index
      ↓                        ↓
      └──────── Merge ─────────┘
               ↓
        Hybrid Scoring (α * neural + β * symbolic)
               ↓
        Top-K Results
```

### Expected Benefits

**Quantified Performance Improvements:**

| Query Type | Current (Post-Filter) | Neuro-Symbolic | Improvement |
|------------|----------------------|----------------|-------------|
| Similarity + 1 filter | 50ms (10K retrieved) | 5ms (100 retrieved) | **10x faster** |
| Similarity + 3 filters | 200ms (50K retrieved) | 8ms (200 retrieved) | **25x faster** |
| Complex boolean logic | Not supported | 15ms | **∞** (new capability) |
| Multi-modal query | Manual joins | 20ms | **50x faster** |

**Qualitative Benefits:**
- Express complex queries naturally (no manual post-processing)
- Efficient execution with filter pushdown
- Support for real-world use cases (e-commerce, research, RAG)
- Better accuracy through multi-modal fusion
- Graph-aware queries (community detection, path constraints)

## Technical Design

### Architecture Diagram (ASCII Art)

```
┌─────────────────────────────────────────────────────────────────┐
│              Neuro-Symbolic Query Execution Pipeline             │
└─────────────────────────────────────────────────────────────────┘

User Query (SQL/Cypher + Vector Similarity)
     │
     │  Example: "SELECT * FROM vectors
     │             WHERE cosine_similarity(embedding, $query) > 0.8
     │             AND category = 'research' AND year >= 2023
     │             ORDER BY similarity DESC LIMIT 10"
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│  Query Parser & AST Builder                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │  Parse query into Abstract Syntax Tree (AST)               │ │
│  │  ┌──────────────────────────────────────────────────────┐  │ │
│  │  │ SELECT                                               │  │ │
│  │  │   WHERE                                              │  │ │
│  │  │     AND                                              │  │ │
│  │  │       ├─ cosine_similarity(emb, $q) > 0.8 [NEURAL]  │  │ │
│  │  │       ├─ category = 'research'        [SYMBOLIC]    │  │ │
│  │  │       └─ year >= 2023                 [SYMBOLIC]    │  │ │
│  │  │   ORDER BY similarity DESC                           │  │ │
│  │  │   LIMIT 10                                           │  │ │
│  │  └──────────────────────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│  Query Optimizer                                                 │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │  Analyze predicates and rewrite query for efficiency      │ │
│  │                                                             │ │
│  │  1. Predicate Pushdown:                                    │ │
│  │     Move filters into HNSW search (before candidate gen)   │ │
│  │                                                             │ │
│  │  2. Index Selection:                                       │ │
│  │     Choose best index for symbolic predicates              │ │
│  │     - category: inverted index                             │ │
│  │     - year: range index (B-tree)                           │ │
│  │                                                             │ │
│  │  3. Execution Strategy:                                    │ │
│  │     - If few categories: scan category index first         │ │
│  │     - If similarity selective: HNSW first, then filter     │ │
│  │     - If balanced: hybrid merge                            │ │
│  │                                                             │ │
│  │  4. Hybrid Scoring:                                        │ │
│  │     score = α * neural_sim + β * symbolic_score            │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│  Execution Plan                                                  │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │  Step 1: HNSW Search (neural)                              │ │
│  │    - Target: similarity > 0.8                              │ │
│  │    - Candidate pool: ef=200                                │ │
│  │    - Early termination: collect ~100 candidates            │ │
│  │    - Filter during search: year >= 2023                    │ │
│  │    Output: {node_id, similarity} for ~100 candidates       │ │
│  │                                                             │ │
│  │  Step 2: Symbolic Filtering (metadata index)               │ │
│  │    - Lookup category index: category = 'research'          │ │
│  │    - Intersect with HNSW candidates                        │ │
│  │    Output: {node_id, similarity, metadata} for ~30 nodes   │ │
│  │                                                             │ │
│  │  Step 3: Hybrid Scoring                                    │ │
│  │    - Compute symbolic_score (e.g., recency bonus)          │ │
│  │    - Combined: 0.7 * similarity + 0.3 * symbolic_score     │ │
│  │    Output: {node_id, hybrid_score}                         │ │
│  │                                                             │ │
│  │  Step 4: Top-K Selection                                   │ │
│  │    - Sort by hybrid_score DESC                             │ │
│  │    - Return top 10                                         │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│  Result Set                                                      │
│  [{id: 42, similarity: 0.95, category: 'research', year: 2024}, │
│   {id: 137, similarity: 0.92, category: 'research', year: 2023},│
│   ...]                                                           │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│              Indexing & Storage Architecture                     │
└─────────────────────────────────────────────────────────────────┘

Vector Data:
┌─────────────────────────────────────────────────────────────────┐
│  HNSW Index (vector similarity)                                  │
│  - Node ID → Embedding vector                                   │
│  - Graph structure for approximate NN search                    │
└─────────────────────────────────────────────────────────────────┘

Metadata Data:
┌─────────────────────────────────────────────────────────────────┐
│  Inverted Index (categorical attributes)                        │
│  - category → {node_ids}                                        │
│  - tag → {node_ids}                                             │
│  - author → {node_ids}                                          │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│  B-Tree Index (range attributes)                                │
│  - year → sorted {node_ids}                                     │
│  - price → sorted {node_ids}                                    │
│  - timestamp → sorted {node_ids}                                │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│  Roaring Bitmap Index (set operations)                          │
│  - Efficient AND/OR/NOT on node ID sets                         │
│  - Compressed storage for sparse sets                           │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│  Graph Index (structural constraints)                           │
│  - Community membership: community_id → {node_ids}              │
│  - k-hop neighborhoods: precomputed for common queries          │
│  - Path constraints: shortest path caches                       │
└─────────────────────────────────────────────────────────────────┘
```

### Core Data Structures (Rust)

```rust
// File: crates/ruvector-query/src/neuro_symbolic/mod.rs

use std::collections::{HashMap, HashSet};
use serde::{Deserialize, Serialize};

/// Neuro-symbolic query execution engine
pub struct NeuroSymbolicEngine {
    /// HNSW index for vector similarity
    hnsw_index: Arc<HnswIndex>,

    /// Metadata indexes (inverted, B-tree, etc.)
    metadata_indexes: MetadataIndexes,

    /// Query optimizer
    optimizer: QueryOptimizer,

    /// Execution planner
    planner: ExecutionPlanner,

    /// Hybrid scoring configuration
    scoring_config: HybridScoringConfig,
}

/// Query representation (SQL/Cypher AST)
#[derive(Debug, Clone)]
pub struct Query {
    /// SELECT clause (which fields to return)
    pub select: Vec<String>,

    /// WHERE clause (predicates)
    pub where_clause: Option<Predicate>,

    /// ORDER BY clause
    pub order_by: Vec<OrderBy>,

    /// LIMIT clause
    pub limit: Option<usize>,

    /// OFFSET clause
    pub offset: Option<usize>,
}

/// Predicate tree (boolean logic)
#[derive(Debug, Clone)]
pub enum Predicate {
    /// Neural predicate: vector similarity
    VectorSimilarity {
        field: String,
        query_vector: Vec<f32>,
        operator: ComparisonOp,  // >, <, =
        threshold: f32,
        metric: SimilarityMetric,  // cosine, euclidean, dot
    },

    /// Symbolic predicate: metadata constraint
    Attribute {
        field: String,
        operator: ComparisonOp,
        value: Value,
    },

    /// Graph predicate: structural constraint
    Graph {
        constraint: GraphConstraint,
    },

    /// Boolean operators
    And(Box<Predicate>, Box<Predicate>),
    Or(Box<Predicate>, Box<Predicate>),
    Not(Box<Predicate>),
}

#[derive(Debug, Clone)]
pub enum GraphConstraint {
    /// Node in community
    InCommunity { community_id: u32 },

    /// Within k hops of node
    WithinKHops { source_node: u32, k: usize },

    /// On path between two nodes
    OnPath { source: u32, target: u32 },

    /// Has edge to node
    ConnectedTo { node_id: u32 },
}

#[derive(Debug, Clone, Copy)]
pub enum ComparisonOp {
    Eq,    // =
    Ne,    // !=
    Lt,    // <
    Le,    // <=
    Gt,    // >
    Ge,    // >=
    In,    // IN (...)
    Like,  // LIKE (string pattern)
}

#[derive(Debug, Clone)]
pub enum Value {
    Int(i64),
    Float(f64),
    String(String),
    Bool(bool),
    List(Vec<Value>),
}

#[derive(Debug, Clone, Copy)]
pub enum SimilarityMetric {
    Cosine,
    Euclidean,
    DotProduct,
    L1,
}

/// Metadata indexing structures
pub struct MetadataIndexes {
    /// Inverted indexes for categorical fields
    inverted: HashMap<String, InvertedIndex>,

    /// B-tree indexes for range queries
    btree: HashMap<String, BTreeIndex>,

    /// Roaring bitmap for set operations
    bitmap_store: BitmapStore,

    /// Graph structural indexes
    graph_index: GraphStructureIndex,
}

/// Inverted index: field_value → {node_ids}
pub struct InvertedIndex {
    /// Map from value to posting list (node IDs)
    postings: HashMap<String, RoaringBitmap>,

    /// Statistics for query optimization
    stats: IndexStats,
}

/// B-tree index for range queries
pub struct BTreeIndex {
    /// Sorted map from value to node IDs
    tree: BTreeMap<OrderedValue, RoaringBitmap>,

    /// Statistics
    stats: IndexStats,
}

/// Roaring bitmap store for efficient set operations
pub struct BitmapStore {
    /// Node ID sets as compressed bitmaps
    bitmaps: HashMap<String, RoaringBitmap>,
}

/// Graph structure indexes
pub struct GraphStructureIndex {
    /// Community assignments
    communities: HashMap<u32, RoaringBitmap>,

    /// k-hop neighborhoods (precomputed)
    khop_cache: HashMap<(u32, usize), RoaringBitmap>,

    /// Shortest path cache
    path_cache: PathCache,
}

#[derive(Debug, Default)]
pub struct IndexStats {
    pub num_unique_values: usize,
    pub total_postings: usize,
    pub avg_posting_length: f64,
    pub selectivity: f64,  // fraction of nodes matching
}

/// Query execution plan
#[derive(Debug)]
pub struct ExecutionPlan {
    /// Ordered steps to execute
    pub steps: Vec<ExecutionStep>,

    /// Estimated cost
    pub estimated_cost: f64,

    /// Estimated result size
    pub estimated_results: usize,
}

#[derive(Debug)]
pub enum ExecutionStep {
    /// HNSW vector search
    VectorSearch {
        query_vector: Vec<f32>,
        similarity_threshold: f32,
        metric: SimilarityMetric,
        ef: usize,
        filters: Vec<InlineFilter>,  // Filters applied during search
    },

    /// Metadata index lookup
    IndexScan {
        index_name: String,
        predicate: Predicate,
    },

    /// Graph structure traversal
    GraphTraversal {
        constraint: GraphConstraint,
    },

    /// Set intersection (AND)
    Intersect {
        left: Box<ExecutionStep>,
        right: Box<ExecutionStep>,
    },

    /// Set union (OR)
    Union {
        left: Box<ExecutionStep>,
        right: Box<ExecutionStep>,
    },

    /// Set difference (NOT)
    Difference {
        left: Box<ExecutionStep>,
        right: Box<ExecutionStep>,
    },

    /// Hybrid scoring
    HybridScore {
        neural_scores: HashMap<u32, f32>,
        symbolic_scores: HashMap<u32, f32>,
        alpha: f32,  // neural weight
        beta: f32,   // symbolic weight
    },

    /// Top-K selection
    TopK {
        input: Box<ExecutionStep>,
        k: usize,
        order_by: Vec<OrderBy>,
    },
}

/// Filter applied during HNSW search (pushdown)
#[derive(Debug, Clone)]
pub struct InlineFilter {
    pub field: String,
    pub operator: ComparisonOp,
    pub value: Value,
}

/// Hybrid scoring configuration
#[derive(Debug, Clone)]
pub struct HybridScoringConfig {
    /// Weight for neural similarity score
    pub neural_weight: f32,

    /// Weight for symbolic score
    pub symbolic_weight: f32,

    /// Normalization method
    pub normalization: NormalizationMethod,
}

#[derive(Debug, Clone, Copy)]
pub enum NormalizationMethod {
    /// Min-max normalization [0, 1]
    MinMax,

    /// Z-score normalization
    ZScore,

    /// None (assume scores already normalized)
    None,
}

/// Query result
#[derive(Debug, Serialize, Deserialize)]
pub struct QueryResult {
    /// Matched node IDs
    pub node_ids: Vec<u32>,

    /// Neural similarity scores
    pub neural_scores: Vec<f32>,

    /// Symbolic scores (if applicable)
    pub symbolic_scores: Option<Vec<f32>>,

    /// Hybrid scores
    pub hybrid_scores: Vec<f32>,

    /// Metadata for each result
    pub metadata: Vec<HashMap<String, Value>>,

    /// Query execution statistics
    pub stats: QueryStats,
}

#[derive(Debug, Serialize, Deserialize, Default)]
pub struct QueryStats {
    /// Total execution time (milliseconds)
    pub total_time_ms: f64,

    /// Time breakdown by step
    pub step_times: Vec<(String, f64)>,

    /// Number of candidates evaluated
    pub candidates_evaluated: usize,

    /// Number of results returned
    pub results_returned: usize,

    /// Index usage
    pub indexes_used: Vec<String>,
}

#[derive(Debug, Clone)]
pub struct OrderBy {
    pub field: String,
    pub direction: SortDirection,
}

#[derive(Debug, Clone, Copy)]
pub enum SortDirection {
    Asc,
    Desc,
}

/// Wrapper for ordered values in B-tree
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub enum OrderedValue {
    Int(i64),
    Float(OrderedFloat<f64>),
    String(String),
}

use ordered_float::OrderedFloat;
use roaring::RoaringBitmap;
use std::collections::BTreeMap;
use std::sync::Arc;
```

### Key Algorithms (Pseudocode)

#### 1. Query Execution Algorithm

```python
function execute_neuro_symbolic_query(query: Query, engine: NeuroSymbolicEngine) -> QueryResult:
    """
    Execute neuro-symbolic query with hybrid scoring.

    Main algorithm: parse → optimize → plan → execute → score → return
    """
    start_time = now()

    # Step 1: Parse query into AST (already done, query is AST)
    # Step 2: Optimize query (predicate pushdown, index selection)
    optimized_query = engine.optimizer.optimize(query)

    # Step 3: Generate execution plan
    plan = engine.planner.create_plan(optimized_query)

    # Step 4: Execute plan steps
    result_set = execute_plan(plan, engine)

    # Step 5: Hybrid scoring
    if has_both_neural_and_symbolic(plan):
        result_set = apply_hybrid_scoring(
            result_set,
            engine.scoring_config
        )

    # Step 6: Apply ORDER BY and LIMIT
    result_set = sort_and_limit(
        result_set,
        query.order_by,
        query.limit,
        query.offset
    )

    # Step 7: Fetch metadata for results
    metadata = fetch_metadata(result_set.node_ids, query.select)

    execution_time = now() - start_time

    return QueryResult(
        node_ids=result_set.node_ids,
        neural_scores=result_set.neural_scores,
        symbolic_scores=result_set.symbolic_scores,
        hybrid_scores=result_set.hybrid_scores,
        metadata=metadata,
        stats=QueryStats(
            total_time_ms=execution_time,
            candidates_evaluated=result_set.candidates_evaluated,
            results_returned=len(result_set.node_ids),
            indexes_used=plan.indexes_used
        )
    )


function execute_plan(plan: ExecutionPlan, engine: NeuroSymbolicEngine) -> IntermediateResult:
    """
    Recursively execute plan steps.
    """
    results = None

    for step in plan.steps:
        match step:
            case VectorSearch:
                # HNSW search with optional filters
                results = execute_vector_search(step, engine.hnsw_index)

            case IndexScan:
                # Lookup in metadata index
                results = execute_index_scan(step, engine.metadata_indexes)

            case GraphTraversal:
                # Graph structure query
                results = execute_graph_traversal(step, engine.metadata_indexes.graph_index)

            case Intersect:
                # AND: set intersection
                left = execute_plan_step(step.left, engine)
                right = execute_plan_step(step.right, engine)
                results = intersect_results(left, right)

            case Union:
                # OR: set union
                left = execute_plan_step(step.left, engine)
                right = execute_plan_step(step.right, engine)
                results = union_results(left, right)

            case Difference:
                # NOT: set difference
                left = execute_plan_step(step.left, engine)
                right = execute_plan_step(step.right, engine)
                results = difference_results(left, right)

            case HybridScore:
                # Compute hybrid scores
                results = compute_hybrid_scores(
                    step.neural_scores,
                    step.symbolic_scores,
                    step.alpha,
                    step.beta
                )

            case TopK:
                # Select top-k results
                input_results = execute_plan_step(step.input, engine)
                results = select_top_k(input_results, step.k, step.order_by)

    return results


function execute_vector_search(step: VectorSearch, hnsw: HnswIndex) -> IntermediateResult:
    """
    HNSW search with filter pushdown.

    Key optimization: Apply symbolic filters during HNSW traversal
    to avoid generating candidates that will be filtered out anyway.
    """
    query_vector = step.query_vector
    similarity_threshold = step.similarity_threshold
    ef = step.ef
    inline_filters = step.filters

    # HNSW search with inline filtering
    candidates = []
    visited = set()

    # Start from entry point
    current_node = hnsw.entry_point
    layer = hnsw.max_layer

    while layer >= 0:
        # Greedy search at this layer
        while True:
            neighbors = hnsw.get_neighbors(current_node, layer)
            best_neighbor = None
            best_distance = float('inf')

            for neighbor in neighbors:
                if neighbor in visited:
                    continue

                # Apply inline filters BEFORE computing distance
                if not passes_inline_filters(neighbor, inline_filters, hnsw.metadata):
                    continue  # Skip this neighbor entirely

                # Compute distance only for filtered candidates
                distance = compute_distance(query_vector, hnsw.get_vector(neighbor))
                similarity = distance_to_similarity(distance, step.metric)

                if similarity >= similarity_threshold:
                    candidates.append((neighbor, similarity))

                if distance < best_distance:
                    best_distance = distance
                    best_neighbor = neighbor

                visited.add(neighbor)

            if best_neighbor is None:
                break  # No improvement

            current_node = best_neighbor

        layer -= 1

    # Sort candidates by similarity
    candidates.sort(key=lambda x: x[1], reverse=True)

    return IntermediateResult(
        node_ids=[node_id for node_id, _ in candidates],
        neural_scores=[score for _, score in candidates],
        candidates_evaluated=len(visited)
    )


function passes_inline_filters(node_id: u32, filters: List[InlineFilter], metadata: MetadataStore) -> bool:
    """
    Check if node passes all inline filters.

    This avoids computing distance for nodes that fail metadata constraints.
    """
    for filter in filters:
        node_value = metadata.get(node_id, filter.field)
        if not evaluate_predicate(node_value, filter.operator, filter.value):
            return False  # Failed a filter

    return True  # Passed all filters


function execute_index_scan(step: IndexScan, indexes: MetadataIndexes) -> IntermediateResult:
    """
    Scan metadata index to get matching node IDs.
    """
    index_name = step.index_name
    predicate = step.predicate

    match predicate:
        case Attribute(field, operator, value):
            if operator == ComparisonOp.Eq:
                # Exact match: use inverted index
                posting_list = indexes.inverted[field].lookup(value)
                return IntermediateResult(
                    node_ids=posting_list.to_vec(),
                    symbolic_scores=[1.0] * len(posting_list)  # Binary: matches or not
                )

            elif operator in [ComparisonOp.Lt, ComparisonOp.Le, ComparisonOp.Gt, ComparisonOp.Ge]:
                # Range query: use B-tree index
                matching_nodes = indexes.btree[field].range_query(operator, value)
                return IntermediateResult(
                    node_ids=matching_nodes.to_vec(),
                    symbolic_scores=[1.0] * len(matching_nodes)
                )

            elif operator == ComparisonOp.In:
                # IN query: union of inverted index lookups
                all_nodes = RoaringBitmap()
                for v in value.list:
                    posting_list = indexes.inverted[field].lookup(v)
                    all_nodes |= posting_list  # Union

                return IntermediateResult(
                    node_ids=all_nodes.to_vec(),
                    symbolic_scores=[1.0] * len(all_nodes)
                )


function execute_graph_traversal(step: GraphTraversal, graph_index: GraphStructureIndex) -> IntermediateResult:
    """
    Execute graph structural constraint.
    """
    match step.constraint:
        case InCommunity(community_id):
            # Lookup precomputed community membership
            node_ids = graph_index.communities.get(community_id)
            return IntermediateResult(
                node_ids=node_ids.to_vec(),
                symbolic_scores=[1.0] * len(node_ids)
            )

        case WithinKHops(source_node, k):
            # Lookup precomputed k-hop neighborhood
            key = (source_node, k)
            if key in graph_index.khop_cache:
                node_ids = graph_index.khop_cache[key]
            else:
                # Compute on-the-fly via BFS
                node_ids = compute_khop_neighbors(source_node, k, graph_index.graph)

            return IntermediateResult(
                node_ids=node_ids.to_vec(),
                symbolic_scores=[1.0 / (1 + distance)] for distance in range(len(node_ids))
            )

        case OnPath(source, target):
            # Check path cache
            path_nodes = graph_index.path_cache.get_path(source, target)
            return IntermediateResult(
                node_ids=path_nodes,
                symbolic_scores=[1.0] * len(path_nodes)
            )


function intersect_results(left: IntermediateResult, right: IntermediateResult) -> IntermediateResult:
    """
    Set intersection (AND): keep nodes in both sets.

    Use Roaring Bitmap for efficient intersection.
    """
    left_bitmap = RoaringBitmap.from_sorted(left.node_ids)
    right_bitmap = RoaringBitmap.from_sorted(right.node_ids)

    intersection = left_bitmap & right_bitmap  # Bitmap AND

    # Combine scores (average for simplicity)
    node_ids = intersection.to_vec()
    combined_scores = []
    for node_id in node_ids:
        left_score = left.get_score(node_id)
        right_score = right.get_score(node_id)
        combined_scores.append((left_score + right_score) / 2.0)

    return IntermediateResult(
        node_ids=node_ids,
        scores=combined_scores
    )


function apply_hybrid_scoring(result_set, config: HybridScoringConfig) -> IntermediateResult:
    """
    Combine neural and symbolic scores.

    Formula: hybrid_score = α * normalize(neural) + β * normalize(symbolic)
    """
    neural_scores = result_set.neural_scores
    symbolic_scores = result_set.symbolic_scores

    # Normalize scores to [0, 1]
    if config.normalization == NormalizationMethod.MinMax:
        neural_norm = min_max_normalize(neural_scores)
        symbolic_norm = min_max_normalize(symbolic_scores)
    elif config.normalization == NormalizationMethod.ZScore:
        neural_norm = z_score_normalize(neural_scores)
        symbolic_norm = z_score_normalize(symbolic_scores)
    else:
        neural_norm = neural_scores
        symbolic_norm = symbolic_scores

    # Combine with weights
    alpha = config.neural_weight
    beta = config.symbolic_weight
    hybrid_scores = [
        alpha * n + beta * s
        for n, s in zip(neural_norm, symbolic_norm)
    ]

    result_set.hybrid_scores = hybrid_scores
    return result_set
```

#### 2. Query Optimization

```python
function optimize_query(query: Query, optimizer: QueryOptimizer) -> Query:
    """
    Optimize query execution plan.

    Key optimizations:
    1. Predicate pushdown (filters into HNSW search)
    2. Index selection (choose best index for each predicate)
    3. Join reordering (cheapest predicates first)
    4. Early termination (stop when enough candidates found)
    """
    # Extract predicates from WHERE clause
    predicates = extract_predicates(query.where_clause)

    # Classify predicates
    neural_preds = [p for p in predicates if is_neural_predicate(p)]
    symbolic_preds = [p for p in predicates if is_symbolic_predicate(p)]
    graph_preds = [p for p in predicates if is_graph_predicate(p)]

    # Estimate selectivity for each predicate
    selectivities = {}
    for pred in predicates:
        selectivities[pred] = estimate_selectivity(pred, optimizer.stats)

    # Predicate pushdown: which filters can be applied during HNSW search?
    inline_filters = []
    post_filters = []

    for pred in symbolic_preds:
        if can_pushdown(pred):
            inline_filters.append(pred)
        else:
            post_filters.append(pred)

    # Index selection: choose best index for each symbolic predicate
    index_plan = {}
    for pred in symbolic_preds:
        best_index = choose_best_index(pred, optimizer.indexes, selectivities[pred])
        index_plan[pred] = best_index

    # Reorder predicates: most selective first
    ordered_predicates = sorted(predicates, key=lambda p: selectivities[p])

    # Build optimized execution plan
    optimized_query = rewrite_query(
        query,
        inline_filters=inline_filters,
        post_filters=post_filters,
        index_plan=index_plan,
        predicate_order=ordered_predicates
    )

    return optimized_query


function estimate_selectivity(predicate, stats) -> float:
    """
    Estimate fraction of nodes matching predicate.

    Uses index statistics (histograms, cardinality, etc.)
    """
    match predicate:
        case VectorSimilarity(threshold):
            # Estimate based on similarity distribution
            return estimate_similarity_selectivity(threshold, stats.similarity_histogram)

        case Attribute(field, operator, value):
            # Estimate based on attribute distribution
            if operator == ComparisonOp.Eq:
                return 1.0 / stats.cardinality[field]  # Uniform assumption
            elif operator in [Lt, Le, Gt, Ge]:
                return estimate_range_selectivity(field, operator, value, stats)
            elif operator == In:
                return len(value.list) / stats.cardinality[field]

        case Graph(constraint):
            # Estimate based on graph structure
            match constraint:
                case InCommunity(id):
                    return stats.community_sizes[id] / stats.total_nodes
                case WithinKHops(node, k):
                    return estimate_khop_size(node, k, stats) / stats.total_nodes


function can_pushdown(predicate) -> bool:
    """
    Check if predicate can be pushed into HNSW search.

    Only simple equality/range predicates on indexed fields can be pushed down.
    """
    match predicate:
        case Attribute(field, operator, value):
            # Can pushdown if operator is simple and field is indexed
            return operator in [Eq, Lt, Le, Gt, Ge, In] and is_indexed(field)

        case _:
            return False  # Complex predicates handled post-search
```

### API Design (Function Signatures)

```rust
// File: crates/ruvector-query/src/neuro_symbolic/mod.rs

impl NeuroSymbolicEngine {
    /// Create a new neuro-symbolic query engine
    pub fn new(
        hnsw_index: Arc<HnswIndex>,
        metadata_path: impl AsRef<Path>,
    ) -> Result<Self, QueryError>;

    /// Execute a query (SQL or Cypher syntax)
    pub fn execute_query(
        &self,
        query: &str,
    ) -> Result<QueryResult, QueryError>;

    /// Execute a parsed query (AST)
    pub fn execute_parsed_query(
        &self,
        query: Query,
    ) -> Result<QueryResult, QueryError>;

    /// Add metadata index for a field
    pub fn create_index(
        &mut self,
        field: &str,
        index_type: IndexType,
    ) -> Result<(), QueryError>;

    /// Update hybrid scoring configuration
    pub fn set_scoring_config(&mut self, config: HybridScoringConfig);

    /// Get query execution statistics
    pub fn stats(&self) -> QueryEngineStats;
}

#[derive(Debug, Clone, Copy)]
pub enum IndexType {
    Inverted,  // Categorical fields
    BTree,     // Range queries
    Bitmap,    // Set operations
}

impl Query {
    /// Parse SQL query string into AST
    pub fn parse_sql(query: &str) -> Result<Self, ParseError>;

    /// Parse Cypher query string into AST
    pub fn parse_cypher(query: &str) -> Result<Self, ParseError>;

    /// Validate query syntax and semantics
    pub fn validate(&self) -> Result<(), ValidationError>;
}

impl Predicate {
    /// Evaluate predicate on a node
    pub fn evaluate(
        &self,
        node_id: u32,
        vector_store: &VectorStore,
        metadata_store: &MetadataStore,
    ) -> bool;

    /// Extract referenced fields
    pub fn referenced_fields(&self) -> Vec<String>;

    /// Check if predicate is neural (vector similarity)
    pub fn is_neural(&self) -> bool;

    /// Check if predicate is symbolic (metadata)
    pub fn is_symbolic(&self) -> bool;

    /// Check if predicate is graph-structural
    pub fn is_graph_structural(&self) -> bool;
}

impl MetadataIndexes {
    /// Create indexes from metadata file
    pub fn from_metadata(path: impl AsRef<Path>) -> Result<Self, IndexError>;

    /// Add inverted index for field
    pub fn add_inverted_index(
        &mut self,
        field: &str,
        values: HashMap<String, Vec<u32>>,
    ) -> Result<(), IndexError>;

    /// Add B-tree index for field
    pub fn add_btree_index(
        &mut self,
        field: &str,
        values: Vec<(OrderedValue, u32)>,
    ) -> Result<(), IndexError>;

    /// Query inverted index
    pub fn query_inverted(
        &self,
        field: &str,
        value: &str,
    ) -> Option<&RoaringBitmap>;

    /// Query B-tree index (range)
    pub fn query_btree_range(
        &self,
        field: &str,
        operator: ComparisonOp,
        value: OrderedValue,
    ) -> Option<RoaringBitmap>;

    /// Intersect bitmaps (AND operation)
    pub fn intersect(&self, bitmaps: &[RoaringBitmap]) -> RoaringBitmap;

    /// Union bitmaps (OR operation)
    pub fn union(&self, bitmaps: &[RoaringBitmap]) -> RoaringBitmap;

    /// Difference bitmaps (NOT operation)
    pub fn difference(&self, left: &RoaringBitmap, right: &RoaringBitmap) -> RoaringBitmap;
}

#[derive(Debug, Default)]
pub struct QueryEngineStats {
    pub total_queries: u64,
    pub avg_query_time_ms: f64,
    pub cache_hit_rate: f64,
    pub avg_candidates_evaluated: f64,
}
```

## Integration Points

### Affected Crates/Modules

1. **`ruvector-query`** (New Crate)
   - New module: `src/neuro_symbolic/mod.rs` - Core engine
   - New module: `src/neuro_symbolic/parser.rs` - SQL/Cypher parser
   - New module: `src/neuro_symbolic/optimizer.rs` - Query optimizer
   - New module: `src/neuro_symbolic/planner.rs` - Execution planner
   - New module: `src/neuro_symbolic/indexes.rs` - Metadata indexing

2. **`ruvector-core`** (Integration)
   - Modified: `src/index/hnsw.rs` - Add filter callback support
   - Modified: `src/vector_store.rs` - Expose metadata API

3. **`ruvector-api`** (Exposure)
   - Modified: `src/query.rs` - Add neuro-symbolic query endpoint
   - New: `src/query/sql.rs` - SQL query interface
   - New: `src/query/cypher.rs` - Cypher query interface

4. **`ruvector-bindings`** (Language Bindings)
   - Modified: `python/src/lib.rs` - Expose query API
   - Modified: `nodejs/src/lib.rs` - Expose query API

### New Modules to Create

```
crates/ruvector-query/   # New crate
├── src/
│   ├── neuro_symbolic/
│   │   ├── mod.rs              # Core engine
│   │   ├── parser.rs           # Query parsing
│   │   ├── optimizer.rs        # Query optimization
│   │   ├── planner.rs          # Execution planning
│   │   ├── executor.rs         # Query execution
│   │   ├── indexes.rs          # Metadata indexing
│   │   ├── scoring.rs          # Hybrid scoring
│   │   └── stats.rs            # Statistics collection
│   └── lib.rs

examples/
├── neuro_symbolic_queries/
│   ├── sql_examples.rs         # SQL query examples
│   ├── cypher_examples.rs      # Cypher query examples
│   ├── hybrid_scoring.rs       # Hybrid scoring examples
│   └── README.md
```

### Dependencies on Other Features

**Depends On:**
- **HNSW Index**: Core vector search functionality
- **Existing Cypher Support**: Extend existing graph query support

**Synergies With:**
- **GNN-Guided Routing (Feature 1)**: Can use GNN for smarter query execution
- **Incremental Learning (Feature 2)**: Real-time index updates support streaming queries

**External Dependencies:**
- `sqlparser` - SQL parsing
- `cypher-parser` - Cypher parsing (if not already present)
- `roaring` - Roaring Bitmap for efficient set operations
- `serde` - Query serialization

## Regression Prevention

### What Existing Functionality Could Break

1. **Pure Vector Search Performance**
   - Risk: Adding metadata lookups slows down simple vector queries
   - Impact: Regression in baseline HNSW performance

2. **Memory Usage**
   - Risk: Metadata indexes consume excessive RAM
   - Impact: OOM on large datasets

3. **Query Correctness**
   - Risk: Filter pushdown logic has bugs, returns wrong results
   - Impact: Incorrect search results

4. **Cypher Compatibility**
   - Risk: Extending Cypher syntax breaks existing queries
   - Impact: Breaking change for existing users

### Test Cases to Prevent Regressions

```rust
// File: crates/ruvector-query/tests/neuro_symbolic_regression_tests.rs

#[test]
fn test_pure_vector_search_unchanged() {
    // Simple vector queries should have zero overhead
    let engine = setup_test_engine();

    // Baseline: pure HNSW search (no filters)
    let query_baseline = "SELECT * FROM vectors ORDER BY similarity(embedding, $query) DESC LIMIT 10";

    let start = Instant::now();
    let results = engine.execute_query(query_baseline).unwrap();
    let time_with_engine = start.elapsed();

    // Direct HNSW (without query engine)
    let start = Instant::now();
    let results_direct = engine.hnsw_index.search(&query_vector, 10).unwrap();
    let time_direct = start.elapsed();

    // Query engine should add <5% overhead
    let overhead = (time_with_engine.as_secs_f64() / time_direct.as_secs_f64()) - 1.0;
    assert!(overhead < 0.05, "Overhead: {:.2}%, expected <5%", overhead * 100.0);

    // Results should be identical
    assert_eq!(results.node_ids, results_direct.node_ids);
}

#[test]
fn test_filter_correctness() {
    // Filtered queries must return correct subset
    let engine = setup_test_engine_with_metadata();

    let query = "SELECT * FROM vectors
                 WHERE similarity(embedding, $query) > 0.8
                 AND category = 'research'
                 AND year >= 2023
                 LIMIT 10";

    let results = engine.execute_query(query).unwrap();

    // Verify each result matches ALL predicates
    for node_id in &results.node_ids {
        let similarity = compute_similarity(&query_vector, engine.get_vector(*node_id));
        assert!(similarity > 0.8, "Node {} similarity: {}, expected >0.8", node_id, similarity);

        let category = engine.get_metadata(*node_id, "category");
        assert_eq!(category, "research", "Node {} category: {}, expected 'research'", node_id, category);

        let year = engine.get_metadata(*node_id, "year").parse::<i32>().unwrap();
        assert!(year >= 2023, "Node {} year: {}, expected >=2023", node_id, year);
    }
}

#[test]
fn test_filter_pushdown_performance() {
    // Pushdown filters should be much faster than post-filtering
    let engine = setup_test_engine_with_metadata();

    // With pushdown (optimized)
    let query_pushdown = "SELECT * FROM vectors
                          WHERE similarity(embedding, $query) > 0.8
                          AND category = 'research'
                          LIMIT 10";

    let start = Instant::now();
    let results_pushdown = engine.execute_query(query_pushdown).unwrap();
    let time_pushdown = start.elapsed();

    // Without pushdown (post-filter, manual implementation)
    let all_results = engine.hnsw_index.search(&query_vector, 10000).unwrap();
    let start = Instant::now();
    let results_post: Vec<_> = all_results.into_iter()
        .filter(|r| r.similarity > 0.8)
        .filter(|r| engine.get_metadata(r.node_id, "category") == "research")
        .take(10)
        .collect();
    let time_post = start.elapsed();

    // Pushdown should be ≥5x faster
    let speedup = time_post.as_secs_f64() / time_pushdown.as_secs_f64();
    assert!(speedup >= 5.0, "Speedup: {:.1}x, expected ≥5x", speedup);

    // Results should be identical
    assert_eq!(results_pushdown.node_ids.len(), results_post.len());
}

#[test]
fn test_hybrid_scoring_correctness() {
    // Hybrid scores should combine neural and symbolic correctly
    let engine = setup_test_engine();
    engine.set_scoring_config(HybridScoringConfig {
        neural_weight: 0.7,
        symbolic_weight: 0.3,
        normalization: NormalizationMethod::MinMax,
    });

    let query = "SELECT * FROM vectors
                 WHERE similarity(embedding, $query) > 0.5
                 AND year >= 2020
                 ORDER BY hybrid_score DESC
                 LIMIT 10";

    let results = engine.execute_query(query).unwrap();

    // Verify hybrid score formula
    for i in 0..results.node_ids.len() {
        let neural = results.neural_scores[i];
        let symbolic = results.symbolic_scores.as_ref().unwrap()[i];

        // Normalize (min-max)
        let neural_norm = (neural - 0.5) / (1.0 - 0.5);  // Assuming min=0.5, max=1.0
        let symbolic_norm = (symbolic - 0.0) / (1.0 - 0.0);  // Assuming min=0.0, max=1.0

        let expected_hybrid = 0.7 * neural_norm + 0.3 * symbolic_norm;
        let actual_hybrid = results.hybrid_scores[i];

        assert!((expected_hybrid - actual_hybrid).abs() < 1e-5,
            "Hybrid score mismatch: expected {}, got {}", expected_hybrid, actual_hybrid);
    }
}

#[test]
fn test_boolean_logic_correctness() {
    // AND/OR/NOT operations must be correct
    let engine = setup_test_engine();

    // Test AND
    let query_and = "SELECT * FROM vectors
                     WHERE category = 'A' AND tag = 'X'";
    let results_and = engine.execute_query(query_and).unwrap();

    for node_id in &results_and.node_ids {
        assert_eq!(engine.get_metadata(*node_id, "category"), "A");
        assert_eq!(engine.get_metadata(*node_id, "tag"), "X");
    }

    // Test OR
    let query_or = "SELECT * FROM vectors
                    WHERE category = 'A' OR category = 'B'";
    let results_or = engine.execute_query(query_or).unwrap();

    for node_id in &results_or.node_ids {
        let category = engine.get_metadata(*node_id, "category");
        assert!(category == "A" || category == "B");
    }

    // Test NOT
    let query_not = "SELECT * FROM vectors
                     WHERE category = 'A' AND NOT tag = 'X'";
    let results_not = engine.execute_query(query_not).unwrap();

    for node_id in &results_not.node_ids {
        assert_eq!(engine.get_metadata(*node_id, "category"), "A");
        assert_ne!(engine.get_metadata(*node_id, "tag"), "X");
    }
}
```

### Backward Compatibility Strategy

1. **Opt-In Feature**
   - Neuro-symbolic queries are opt-in (require explicit SQL/Cypher syntax)
   - Existing vector search API unchanged

2. **Graceful Degradation**
   - If metadata indexes not available, fallback to post-filtering
   - Log warning but do not crash

3. **Configuration**
   ```yaml
   query:
     neuro_symbolic:
       enabled: true  # Default: true
       metadata_indexes: true  # Default: true
       hybrid_scoring: true  # Default: true
   ```

4. **API Versioning**
   - New endpoints for neuro-symbolic queries (`/query/sql`, `/query/cypher`)
   - Existing endpoints (`/search`) unchanged

## Implementation Phases

### Phase 1: Core Infrastructure (Week 1-2)

**Goal**: Query parsing and basic execution

**Tasks**:
1. Implement SQL/Cypher parser
2. Build AST representation
3. Implement basic query executor (no optimization)
4. Unit tests for parsing and execution

**Deliverables**:
- `neuro_symbolic/parser.rs`
- `neuro_symbolic/executor.rs`
- Passing unit tests

**Success Criteria**:
- Can parse and execute simple queries (vector similarity only)
- Correct results (matches HNSW baseline)

### Phase 2: Metadata Indexing (Week 2-3)

**Goal**: Support symbolic predicates

**Tasks**:
1. Implement inverted index for categorical fields
2. Implement B-tree index for range queries
3. Integrate Roaring Bitmap for set operations
4. Test index correctness and performance

**Deliverables**:
- `neuro_symbolic/indexes.rs`
- Index creation and query APIs
- Benchmark report

**Success Criteria**:
- Indexes correctly return matching nodes
- Index queries <10ms for typical workloads
- Memory overhead <20% of vector data

### Phase 3: Filter Pushdown (Week 3-4)

**Goal**: Optimize query execution

**Tasks**:
1. Implement filter pushdown into HNSW search
2. Modify HNSW to support filter callbacks
3. Benchmark speedup vs post-filtering
4. Test correctness of pushdown logic

**Deliverables**:
- Modified `hnsw.rs` with filter support
- `neuro_symbolic/optimizer.rs`
- Performance benchmarks

**Success Criteria**:
- ≥5x speedup for filtered queries
- Zero correctness regressions
- Works with complex boolean logic (AND/OR/NOT)

### Phase 4: Hybrid Scoring (Week 4-5)

**Goal**: Combine neural and symbolic scores

**Tasks**:
1. Implement hybrid scoring algorithm
2. Add score normalization methods
3. Tune weights (α, β) for best results
4. Test on real-world datasets

**Deliverables**:
- `neuro_symbolic/scoring.rs`
- Hybrid scoring benchmarks
- Configuration guide

**Success Criteria**:
- Hybrid queries improve relevance metrics (NDCG, MRR)
- Configurable weights work as expected
- Performance <20ms for typical queries

### Phase 5: Production Hardening (Week 5-6)

**Goal**: Production-ready feature

**Tasks**:
1. Add comprehensive error handling
2. Write documentation and examples
3. Stress testing (large datasets, complex queries)
4. Integration with existing Cypher support

**Deliverables**:
- Full error handling
- User documentation
- Example queries
- Regression test suite

**Success Criteria**:
- Zero crashes in stress tests
- Documentation complete
- Ready for alpha release

## Success Metrics

### Performance Benchmarks

**Primary Metrics** (Must Achieve):

| Query Type | Baseline (Post-Filter) | Neuro-Symbolic | Target Improvement |
|------------|------------------------|----------------|--------------------|
| Similarity + 1 filter | 50ms | 5ms | **10x faster** |
| Similarity + 3 filters | 200ms | 8ms | **25x faster** |
| Complex boolean (AND/OR/NOT) | N/A (manual) | 15ms | **New capability** |
| Multi-modal (vector + graph) | 500ms (manual joins) | 20ms | **25x faster** |

**Secondary Metrics**:

| Metric | Target |
|--------|--------|
| Index memory overhead | <20% of vector data |
| Query parsing time | <1ms |
| Hybrid scoring overhead | <2ms |
| Concurrent query throughput | Same as baseline |

### Accuracy Metrics

**Relevance Improvement** (on benchmark datasets):
- NDCG@10: +15% (hybrid scoring vs pure vector)
- MRR (Mean Reciprocal Rank): +20%
- Precision@10: +10%

**Correctness**:
- 100% of filtered results match all predicates
- Zero false positives or false negatives

### Memory/Latency Targets

**Memory**:
- Inverted indexes: <100MB per 1M nodes (categorical fields)
- B-tree indexes: <50MB per 1M nodes (range fields)
- Total overhead: <20% of vector index size

**Latency**:
- Simple query (1 filter): <10ms
- Complex query (3+ filters): <20ms
- Hybrid scoring: <5ms overhead
- P99 latency: <50ms

**Throughput**:
- Concurrent queries: Same as baseline HNSW
- No lock contention on indexes

## Risks and Mitigations

### Technical Risks

**Risk 1: Query Parser Complexity**

*Probability: Medium | Impact: Medium*

**Description**: SQL/Cypher parsing is complex, could have bugs or performance issues.

**Mitigation**:
- Use established parsing libraries (`sqlparser`, `cypher-parser`)
- Extensive test suite with edge cases
- Validate AST before execution
- Provide query validation tool

**Contingency**: Start with simple query subset, expand incrementally.

---

**Risk 2: Index Memory Overhead**

*Probability: High | Impact: Medium*

**Description**: Metadata indexes could consume excessive memory on large datasets.

**Mitigation**:
- Use compressed indexes (Roaring Bitmap for sparse sets)
- Make indexing optional (user chooses which fields to index)
- Monitor memory usage in tests
- Provide index size estimation tool

**Contingency**: Support external indexes (e.g., SQLite) for low-memory environments.

---

**Risk 3: Filter Pushdown Bugs**

*Probability: Medium | Impact: Critical*

**Description**: Incorrect filter logic could return wrong results.

**Mitigation**:
- Extensive correctness testing (ground truth validation)
- Compare pushdown results vs post-filtering
- Add assertion checks in debug builds
- Fuzzing for edge cases

**Contingency**: Add "safe mode" that validates results against post-filtering.

---

**Risk 4: Hybrid Scoring Tuning Difficulty**

*Probability: High | Impact: Low*

**Description**: Users may struggle to tune α/β weights for hybrid scoring.

**Mitigation**:
- Provide automatic weight tuning (based on query logs)
- Document recommended defaults for common use cases
- Add visualization tools for score distributions
- Support A/B testing framework

**Contingency**: Default to pure neural scoring (α=1, β=0) if user unsure.

---

**Risk 5: Cypher Integration Conflicts**

*Probability: Low | Impact: Medium*

**Description**: Extending Cypher syntax could conflict with existing graph queries.

**Mitigation**:
- Careful syntax design (use reserved keywords)
- Version Cypher extensions separately
- Extensive compatibility testing
- Document syntax differences

**Contingency**: Use separate query language (e.g., extended SQL only) if conflicts arise.

---

### Summary Risk Matrix

| Risk | Probability | Impact | Mitigation Priority |
|------|-------------|--------|---------------------|
| Query parser complexity | Medium | Medium | Medium |
| Index memory overhead | High | Medium | **HIGH** |
| Filter pushdown bugs | Medium | Critical | **CRITICAL** |
| Hybrid scoring tuning | High | Low | LOW |
| Cypher integration conflicts | Low | Medium | Medium |

---

## Next Steps

1. **Prototype Phase 1**: Build SQL parser and basic executor (1 week)
2. **Validate Queries**: Test on simple queries, measure correctness (2 days)
3. **Add Metadata Indexes**: Implement inverted + B-tree indexes (1 week)
4. **Benchmark Performance**: Measure speedup vs post-filtering (3 days)
5. **Iterate**: Optimize based on profiling (ongoing)

**Key Decision Points**:
- After Phase 1: Is query parsing fast enough? (<1ms target)
- After Phase 3: Does filter pushdown work correctly? (Zero regressions)
- After Phase 4: Does hybrid scoring improve relevance? (+10% NDCG required)

**Go/No-Go Criteria**:
- ✅ 5x+ speedup on filtered queries
- ✅ Zero correctness regressions
- ✅ Memory overhead <20%
- ✅ Improved relevance metrics