# ADR-QE-007: QAOA MaxCut Implementation **Status**: Proposed **Date**: 2026-02-06 **Authors**: ruv.io, RuVector Team **Deciders**: Architecture Review Board ## Version History | Version | Date | Author | Changes | |---------|------|--------|---------| | 0.1 | 2026-02-06 | ruv.io | Initial QAOA MaxCut architecture proposal | --- ## Context ### Combinatorial Optimization on Quantum Computers The Quantum Approximate Optimization Algorithm (QAOA), introduced by Farhi, Goldstone, and Gutmann (2014), is a leading candidate for demonstrating quantum advantage on combinatorial optimization problems. QAOA constructs a parameterized quantum circuit that encodes the cost function of an optimization problem and uses classical outer-loop optimization to find parameters that maximize the expected cost. ### MaxCut as the Canonical QAOA Problem MaxCut is the prototypical problem for QAOA: given a graph G = (V, E), partition the vertices into two sets S and S-complement to maximize the number of edges crossing the partition. ``` MaxCut Example (5 vertices, 6 edges): 0 ─── 1 │ \ │ │ \ │ 3 ─── 2 │ 4 Optimal cut: S = {0, 2, 4}, S' = {1, 3} Cut value: 5 edges crossing (0-1, 0-3, 1-2, 2-3, 2-4) ``` The cost function is: ``` C(z) = sum_{(i,j) in E} (1 - z_i * z_j) / 2 ``` where z_i in {+1, -1} encodes the partition assignment. ### QAOA Circuit Structure A depth-p QAOA circuit alternates two types of layers: 1. **Phase separation** (encodes the problem): For each edge (i,j), apply exp(-i * gamma * Z_i Z_j / 2) 2. **Mixing** (explores the solution space): For each qubit i, apply exp(-i * beta * X_i) = Rx(2*beta) ``` QAOA Circuit (p layers): |+> ──[Phase(gamma_1)]──[Mix(beta_1)]──[Phase(gamma_2)]──[Mix(beta_2)]── ... ──[Measure] │ Parameters: gamma = [gamma_1, ..., gamma_p], beta = [beta_1, ..., beta_p] │ ▼ Classical Optimizer ``` ### Why QAOA Matters for ruQu | Motivation | Details | |------------|---------| | Optimization benchmarks | Standard workload for evaluating quantum simulator performance | | Graph problems | Natural integration with ruVector graph database (ruvector-graph) | | Variational algorithm | Shares infrastructure with VQE (ADR-QE-005): parameterized circuits, expectation values, classical optimizers | | Scalability study | QAOA depth and graph size provide tunable complexity for benchmarking | | Agent integration | ruVector agents can use QAOA to solve graph optimization tasks autonomously | --- ## Decision ### 1. Phase Separation Operator: Native Rzz Gate The phase separation operator for MaxCut applies exp(-i * gamma * Z_i Z_j / 2) for each edge (i,j). We implement this as a native two-qubit operation via direct amplitude manipulation, avoiding CNOT decomposition. **Mathematical basis**: ``` exp(-i * theta * Z_i Z_j / 2) acts on computational basis states as: |00> -> e^{-i*theta/2} |00> (Z_i Z_j = +1) |01> -> e^{+i*theta/2} |01> (Z_i Z_j = -1) |10> -> e^{+i*theta/2} |10> (Z_i Z_j = -1) |11> -> e^{-i*theta/2} |11> (Z_i Z_j = +1) ``` In the state vector, for each amplitude at index k: - Extract bits i and j from k - Compute parity = bit_i XOR bit_j - Apply phase: `amp[k] *= exp(-i * theta * (-1)^parity / 2)` - If parity = 0 (same bits): `amp[k] *= exp(-i * theta / 2)` - If parity = 1 (different bits): `amp[k] *= exp(+i * theta / 2)` ```rust impl QuantumState { /// Apply Rzz(theta) = exp(-i * theta * Z_i Z_j / 2) via direct amplitude /// manipulation. /// /// For each basis state |k>: /// - Compute parity of bits i and j in k /// - Apply phase e^{-i * theta * (-1)^parity / 2} /// /// Complexity: O(2^n) -- single pass over state vector. /// Vectorizable: all amplitudes are independent (no swaps). /// /// Hardware equivalent: CNOT(i,j) + Rz(theta, j) + CNOT(i,j) = 3 gates. pub fn rzz(&mut self, theta: f64, qubit_i: usize, qubit_j: usize) { let phase_same = Complex64::from_polar(1.0, -theta / 2.0); let phase_diff = Complex64::from_polar(1.0, theta / 2.0); let mask_i = 1_usize << qubit_i; let mask_j = 1_usize << qubit_j; for k in 0..self.amplitudes.len() { let bit_i = (k & mask_i) >> qubit_i; let bit_j = (k & mask_j) >> qubit_j; let parity = bit_i ^ bit_j; if parity == 0 { self.amplitudes[k] *= phase_same; } else { self.amplitudes[k] *= phase_diff; } } } } ``` **Vectorization opportunity**: The inner loop is a streaming operation over the amplitude array with no data dependencies between iterations. This is ideal for SIMD vectorization (AVX-512 can process 8 complex64 values per instruction) and parallelization across cores. ### 2. Mixing Operator The mixing operator applies Rx(2*beta) to each qubit: ``` Rx(2*beta) = exp(-i * beta * X) = [[cos(beta), -i*sin(beta)], [-i*sin(beta), cos(beta)]] ``` This uses the standard single-qubit gate application from the simulator core: ```rust impl QuantumState { /// Apply the QAOA mixing operator: Rx(2*beta) on each qubit. /// /// Complexity: O(n * 2^n) for n qubits. pub fn qaoa_mixing(&mut self, beta: f64) { for qubit in 0..self.n_qubits { self.rx(2.0 * beta, qubit); } } } ``` ### 3. QAOA Circuit Construction A convenience function builds the full QAOA circuit from a graph and parameters: ```rust /// A graph represented as an edge list with optional weights. pub struct Graph { /// Number of vertices pub n_vertices: usize, /// Edges: (vertex_i, vertex_j, weight) pub edges: Vec<(usize, usize, f64)>, } impl Graph { /// Construct from adjacency list. pub fn from_adjacency_list(adj: &[Vec]) -> Self; /// Construct from edge list (unweighted, weight = 1.0). pub fn from_edge_list(n_vertices: usize, edges: &[(usize, usize)]) -> Self; /// Load from ruVector graph query result. pub fn from_ruvector_query(result: &GraphQueryResult) -> Self; } /// QAOA configuration. pub struct QaoaConfig { /// Graph defining the MaxCut instance pub graph: Graph, /// QAOA depth (number of layers) pub p: usize, /// Gamma parameters (phase separation angles), length = p pub gammas: Vec, /// Beta parameters (mixing angles), length = p pub betas: Vec, } /// Build and simulate a QAOA circuit for MaxCut. /// /// Circuit structure for depth p: /// 1. Initialize |+>^n (Hadamard on all qubits) /// 2. For layer l = 1..p: /// a. Phase separation: Rzz(gamma_l, i, j) for each edge (i,j) /// b. Mixing: Rx(2*beta_l) on each qubit /// 3. Return final state pub fn build_qaoa_circuit(config: &QaoaConfig) -> QuantumState { let n = config.graph.n_vertices; let mut state = QuantumState::new(n); // Step 1: Initialize uniform superposition state.hadamard_all(); // Step 2: Alternating phase separation and mixing layers for layer in 0..config.p { let gamma = config.gammas[layer]; let beta = config.betas[layer]; // Phase separation: apply Rzz for each edge for &(i, j, weight) in &config.graph.edges { state.rzz(gamma * weight, i, j); } // Mixing: Rx(2*beta) on each qubit state.qaoa_mixing(beta); } state } ``` **Pseudocode for the complete QAOA MaxCut solver**: ```rust pub fn qaoa_maxcut( graph: &Graph, p: usize, optimizer: &mut dyn ClassicalOptimizer, config: &QaoaOptConfig, ) -> QaoaResult { let n_params = 2 * p; // p gammas + p betas optimizer.initialize(n_params); let mut params = config.initial_params.clone() .unwrap_or_else(|| { // Standard initialization: gamma in [0, pi], beta in [0, pi/2] let mut p_init = vec![0.0; n_params]; for i in 0..p { p_init[i] = 0.5; // gamma_i p_init[p + i] = 0.25; // beta_i } p_init }); let mut best_cost = f64::NEG_INFINITY; let mut best_params = params.clone(); let mut history = Vec::new(); for iteration in 0..config.max_iterations { let gammas = params[..p].to_vec(); let betas = params[p..].to_vec(); // Build and simulate circuit let qaoa_config = QaoaConfig { graph: graph.clone(), p, gammas, betas, }; let state = build_qaoa_circuit(&qaoa_config); // Evaluate MaxCut cost function let cost = maxcut_expectation(&state, graph); if cost > best_cost { best_cost = cost; best_params = params.clone(); } // Gradient computation (parameter-shift rule, same as VQE) let grad = if optimizer.needs_gradient() { Some(qaoa_gradient(graph, p, ¶ms)) } else { None }; history.push(QaoaIteration { iteration, cost, params: params.clone() }); let result = optimizer.step(¶ms, -cost, grad.as_deref()); // Note: negate cost because optimizer minimizes params = result.new_params; if result.converged { break; } } // Sample the final state to get candidate cuts let final_state = build_qaoa_circuit(&QaoaConfig { graph: graph.clone(), p, gammas: best_params[..p].to_vec(), betas: best_params[p..].to_vec(), }); let best_cut = sample_maxcut(&final_state, graph, config.sample_shots); QaoaResult { best_cost, best_params, best_cut, iterations: history.len(), history, approximation_ratio: best_cost / graph.max_cut_upper_bound(), } } ``` ### 4. Cost Function Evaluation The MaxCut cost function in Pauli operator form is: ``` C = sum_{(i,j) in E} w_{ij} * (1 - Z_i Z_j) / 2 ``` This reuses the PauliSum expectation API from ADR-QE-005: ```rust /// Compute the MaxCut cost as the expectation value of the cost Hamiltonian. /// /// C = sum_{(i,j) in E} w_ij * (1 - Z_i Z_j) / 2 /// = sum_{(i,j) in E} w_ij/2 - sum_{(i,j) in E} w_ij/2 * Z_i Z_j /// = const - sum_{(i,j)} w_ij/2 * /// /// Each Z_i Z_j expectation is computed via the efficient diagonal trick: /// = sum_k |amp_k|^2 * (-1)^{bit_i(k) XOR bit_j(k)} pub fn maxcut_expectation(state: &QuantumState, graph: &Graph) -> f64 { let mut cost = 0.0; for &(i, j, weight) in &graph.edges { let mask_i = 1_usize << i; let mask_j = 1_usize << j; let mut zz_expectation = 0.0; for k in 0..state.amplitudes.len() { let bit_i = (k & mask_i) >> i; let bit_j = (k & mask_j) >> j; let parity = bit_i ^ bit_j; let sign = 1.0 - 2.0 * parity as f64; // +1 if same, -1 if different zz_expectation += state.amplitudes[k].norm_sqr() * sign; } cost += weight * (1.0 - zz_expectation) / 2.0; } cost } ``` **Optimization**: Since Z_i Z_j is diagonal in the computational basis, the expectation reduces to a weighted sum over probabilities. No amplitude swapping is needed, and the computation is embarrassingly parallel. ### 5. Sampling Mode In addition to exact expectation values, we support sampling the final state to obtain candidate cuts: ```rust /// Sample the QAOA state to find candidate MaxCut solutions. /// /// Returns the best cut found across `shots` samples. pub fn sample_maxcut( state: &QuantumState, graph: &Graph, shots: usize, ) -> MaxCutSolution { let probabilities: Vec = state.amplitudes.iter() .map(|a| a.norm_sqr()) .collect(); let mut best_cut_value = 0.0; let mut best_bitstring = 0_usize; let mut rng = thread_rng(); for _ in 0..shots { // Sample from probability distribution let sample = sample_from_distribution(&probabilities, &mut rng); // Evaluate cut value for this bitstring let cut_value = evaluate_cut(sample, graph); if cut_value > best_cut_value { best_cut_value = cut_value; best_bitstring = sample; } } MaxCutSolution { partition: best_bitstring, cut_value: best_cut_value, set_s: (0..graph.n_vertices) .filter(|&v| (best_bitstring >> v) & 1 == 1) .collect(), set_s_complement: (0..graph.n_vertices) .filter(|&v| (best_bitstring >> v) & 1 == 0) .collect(), } } ``` ### 6. Graph Interface Three input modes cover common use cases: ```rust impl Graph { /// From adjacency list (unweighted). /// /// Example: adj[0] = [1, 3] means vertex 0 connects to 1 and 3. pub fn from_adjacency_list(adj: &[Vec]) -> Self { let n = adj.len(); let mut edges = Vec::new(); let mut seen = std::collections::HashSet::new(); for (u, neighbors) in adj.iter().enumerate() { for &v in neighbors { let edge = if u < v { (u, v) } else { (v, u) }; if seen.insert(edge) { edges.push((edge.0, edge.1, 1.0)); } } } Self { n_vertices: n, edges } } /// From edge list with uniform weight. pub fn from_edge_list(n_vertices: usize, edge_list: &[(usize, usize)]) -> Self { Self { n_vertices, edges: edge_list.iter().map(|&(u, v)| (u, v, 1.0)).collect(), } } /// From ruVector graph database query result. /// /// Enables QAOA MaxCut on graphs stored in ruvector-graph. pub fn from_ruvector_query(result: &GraphQueryResult) -> Self { // Convert ruvector-graph nodes and edges to QAOA format // Vertex IDs are remapped to contiguous 0..n range todo!() } } ``` ### 7. Tensor Network Optimization for Sparse Graphs For sparse or planar graphs, the QAOA state can be represented more efficiently using tensor network contraction. The key insight is that QAOA circuits have a structure dictated by the graph topology: ``` Tensor Network View of QAOA: Qubit 0: ──[H]──[Rzz(0,1)]──[Rzz(0,3)]──[Rx]── ... Qubit 1: ──[H]──[Rzz(0,1)]──[Rzz(1,2)]──[Rx]── ... Qubit 2: ──[H]──[Rzz(1,2)]──[Rzz(2,3)]──[Rx]── ... Qubit 3: ──[H]──[Rzz(0,3)]──[Rzz(2,3)]──[Rx]── ... For a planar graph with treewidth w, tensor contraction costs O(2^w * poly(n)) instead of O(2^n). For many practical graphs, w << n. ``` ```rust /// Detect graph treewidth and decide simulation strategy. pub fn select_simulation_strategy(graph: &Graph) -> SimulationStrategy { let treewidth = estimate_treewidth(graph); let n = graph.n_vertices; if treewidth <= 20 && n > 24 { // Tensor network contraction is cheaper than full state vector SimulationStrategy::TensorNetwork { contraction_order: compute_contraction_order(graph), estimated_cost: (1 << treewidth) * n * n, } } else { SimulationStrategy::StateVector { estimated_cost: 1 << n, } } } pub enum SimulationStrategy { StateVector { estimated_cost: usize }, TensorNetwork { contraction_order: Vec, estimated_cost: usize, }, } ``` ### 8. Performance Analysis #### Gate Counts and Timing For a graph with n vertices, m edges, and QAOA depth p: | Operation | Gate Count per Layer | Total Gates (p layers) | |-----------|---------------------|----------------------| | Phase separation (Rzz) | m | p * m | | Mixing (Rx) | n | p * n | | **Total per layer** | **m + n** | **p * (m + n)** | **Benchmark estimates**: | Configuration | n | m | p | Total Gates | Estimated Time | |---------------|---|---|---|-------------|---------------| | Small triangle | 3 | 3 | 1 | 6 | <0.01ms | | Petersen graph | 10 | 15 | 3 | 75 | <0.1ms | | Random d-reg (d=3) | 10 | 15 | 5 | 125 | <0.5ms | | Grid 4x5 | 20 | 31 | 3 | 189 | ~50ms | | Grid 4x5 | 20 | 31 | 5 | 315 | ~100ms | | Random d-reg (d=4) | 20 | 40 | 5 | 400 | ~200ms | | Dense (complete) | 20 | 190 | 3 | 630 | ~300ms | | Sparse large | 24 | 36 | 3 | 216 | ~5s | | Dense large | 24 | 276 | 5 | 1500 | ~30s | **Memory requirements**: | Qubits | State Vector Size | Memory | |--------|------------------|--------| | 10 | 1,024 | 16 KB | | 16 | 65,536 | 1 MB | | 20 | 1,048,576 | 16 MB | | 24 | 16,777,216 | 256 MB | | 28 | 268,435,456 | 4 GB | ### 9. Integration with ruvector-graph The connection to ruVector's graph database enables a powerful workflow: ``` ┌─────────────────────────────────────────────────────────────────────┐ │ QAOA MaxCut Pipeline │ │ │ │ ┌──────────────┐ ┌────────────────┐ ┌──────────────────┐ │ │ │ ruvector-graph│ │ QAOA Engine │ │ Result Store │ │ │ │ │ │ │ │ │ │ │ │ Query: │────>│ Build circuit │────>│ Optimal cut │ │ │ │ "find all │ │ Optimize │ │ Partition │ │ │ │ connected │ │ Sample │ │ Approximation │ │ │ │ subgraphs │ │ │ │ ratio │ │ │ │ of size k" │ │ │ │ │ │ │ └──────────────┘ └────────────────┘ └──────────────────┘ │ │ │ │ Data Flow: │ │ 1. Agent queries ruvector-graph for subgraph │ │ 2. Graph converted to QAOA format via Graph::from_ruvector_query() │ │ 3. QAOA optimizer runs with configurable depth p │ │ 4. Results stored in ruVector memory for pattern learning │ │ 5. Agent uses learned patterns to choose p and initial parameters │ └─────────────────────────────────────────────────────────────────────┘ ``` The ruvector-mincut integration is particularly relevant: the existing `SubpolynomialMinCut` algorithm (El-Hayek/Henzinger/Li, O(n^{o(1)}) amortized) provides exact min-cut values that serve as a lower bound for MaxCut verification. QAOA solutions can be validated against this classical baseline. --- ## Consequences ### Benefits 1. **Native Rzz gate** via direct amplitude manipulation avoids CNOT decomposition, yielding a simpler and faster phase separation implementation 2. **PauliSum expectation API reuse** from ADR-QE-005 provides a unified interface for all variational algorithms (VQE, QAOA, and future extensions) 3. **Graph interface flexibility** supports adjacency lists, edge lists, and ruVector graph queries, covering the most common input formats 4. **Tensor network fallback** for low-treewidth graphs extends QAOA to larger problem instances than pure state vector simulation allows 5. **ruvector-graph integration** enables a seamless pipeline from graph storage to quantum optimization to result analysis ### Risks | Risk | Probability | Impact | Mitigation | |------|------------|--------|------------| | QAOA at low depth p gives poor approximation ratios | High | Medium | Support high-p QAOA, classical warm-starting | | Treewidth estimation is NP-hard in general | Medium | Low | Use heuristic upper bounds (min-degree, greedy) | | Parameter landscape has many local minima | Medium | Medium | Multi-start optimization, INTERP initialization | | Large dense graphs exhaust memory | Medium | High | Tensor network fallback, graph coarsening | ### Trade-offs | Decision | Advantage | Disadvantage | |----------|-----------|--------------| | Direct Rzz over CNOT decomposition | Simpler, faster | Not a one-to-one hardware circuit mapping | | Exact expectation over sampling | No statistical noise | Does not model real hardware shot noise | | Automatic strategy selection | Transparent to user | Additional complexity in simulation backend | | Integrated graph interface | Seamless workflow | Coupling to ruvector-graph API | --- ## References - Farhi, E., Goldstone, J., Gutmann, S. "A Quantum Approximate Optimization Algorithm." arXiv:1411.4028 (2014) - Hadfield, S. et al. "From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz." Algorithms 12, 34 (2019) - Zhou, L. et al. "Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on Near-Term Devices." Physical Review X 10, 021067 (2020) - Guerreschi, G.G., Matsuura, A.Y. "QAOA for Max-Cut requires hundreds of qubits for quantum speed-up." Scientific Reports 9, 6903 (2019) - ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines - ADR-QE-005: VQE Algorithm Support (shared parameterized circuit and optimizer infrastructure) - ADR-QE-006: Grover's Search Implementation (quantum state manipulation primitives) - ruvector-mincut: `crates/ruvector-mincut/` - El-Hayek/Henzinger/Li subpolynomial min-cut - ruvector-graph: graph database integration for sourcing MaxCut instances