# ADR-QE-009: Tensor Network Evaluation Mode **Status**: Proposed **Date**: 2026-02-06 **Authors**: ruv.io, RuVector Team **Deciders**: Architecture Review Board --- ## Context Full state-vector simulation stores all 2^n complex amplitudes explicitly, yielding O(2^n) memory and O(G * 2^n) time for G gates. At n=30 this is 16 GiB; at n=40 it exceeds 16 TiB. Many practically interesting circuits, however, contain limited entanglement: | Circuit family | Entanglement structure | Treewidth | |---|---|---| | Shallow QAOA on sparse graphs | Bounded by graph degree | Low (often < 20) | | Separate-register circuits | Disjoint qubit subsets | Sum of sub-widths | | Near-Clifford circuits | Stabilizer + few T gates | Depends on T count | | 1D brickwork (finite depth) | Area-law entanglement | O(depth) | | Random deep circuits (all-to-all) | Volume-law entanglement | O(n) -- no gain | For the first four families, tensor network (TN) methods can trade increased computation for drastically reduced memory by representing each gate as a tensor and contracting the resulting network in an optimized order. The contraction cost scales exponentially in the *treewidth* of the circuit's line graph rather than in the total qubit count. QuantRS2 (the Rust quantum simulation reference) demonstrated tensor network contraction for circuits up to 60 qubits on commodity hardware when treewidth remained below ~25. ruVector's existing `ruvector-mincut` crate already solves graph partitioning problems that are structurally identical to contraction-order optimization, providing a natural integration point. The ruQu engine needs this capability to support: 1. Surface code simulations at distance d >= 7 (49+ data qubits) for decoder validation, where the syndrome extraction circuit is shallow and geometrically local. 2. Variational algorithm prototyping (VQE, QAOA) on graphs larger than 30 nodes. 3. Hybrid workflows where part of the circuit is simulated via state vector and part via tensor contraction. ## Decision ### 1. Feature-Gated Backend Tensor network evaluation is implemented as an optional backend behind the `tensor-network` feature flag in `ruqu-core`: ```toml # ruqu-core/Cargo.toml [features] default = ["state-vector"] state-vector = [] tensor-network = ["dep:ndarray", "dep:petgraph"] all-backends = ["state-vector", "tensor-network"] ``` When both backends are compiled in, the engine selects the backend at runtime based on circuit analysis (see Section 4 below). ### 2. Tensor Representation Every gate becomes a tensor connecting the qubit wire indices it acts on: | Gate type | Tensor rank | Shape | Example | |---|---|---|---| | Single-qubit (H, X, Rz, ...) | 2 | [2, 2] | Input wire -> output wire | | Two-qubit (CNOT, CZ, ...) | 4 | [2, 2, 2, 2] | Two input wires -> two output wires | | Three-qubit (Toffoli) | 6 | [2, 2, 2, 2, 2, 2] | Three input -> three output | | Measurement projector | 2 | [2, 2] | Diagonal in computational basis | | Initial state |0> | 1 | [2] | Single output wire | The circuit is converted into a tensor network graph where: - Each tensor is a node. - Each shared index (qubit wire between consecutive gates) is an edge. - Open indices represent initial states and final measurement outcomes. ``` |0>---[H]---[CNOT_ctrl]---[Rz]--- | |0>-----------[CNOT_tgt]--------- ``` Becomes: ``` Node: init_0 (rank 1) | Node: H_0 (rank 2) | Node: CNOT_01 (rank 4) / \ | Node: Rz_0 (rank 2) | | | Node: meas_0 (rank 2) | Node: init_1 (rank 1) ... (connected via CNOT shared index) Node: meas_1 (rank 2) ``` ### 3. Contraction Strategy Contraction order determines whether the computation is tractable. The cost of contracting two tensors is the product of the dimensions of all indices involved. Finding the optimal contraction order is NP-hard (equivalent to finding minimum treewidth), so we use heuristics. #### Contraction Path Optimization Pseudocode ``` function find_contraction_path(tensor_network: TN) -> ContractionPath: // Phase 1: Simplify the network apply_trivial_contractions(tensor_network) // rank-1 tensors, diagonal pairs // Phase 2: Detect community structure communities = detect_communities(tensor_network.graph) // Phase 3: Contract within communities first (small subproblems) intra_paths = [] for community in communities: subgraph = tensor_network.subgraph(community) if subgraph.num_tensors <= 20: // Exact dynamic programming for small subgraphs path = optimal_einsum_dp(subgraph) else: // Greedy with lookahead for larger subgraphs path = greedy_with_lookahead(subgraph, lookahead=2) intra_paths.append(path) // Phase 4: Contract inter-community edges // Each community is now a single large tensor meta_graph = contract_communities(tensor_network, intra_paths) inter_path = greedy_with_lookahead(meta_graph, lookahead=3) // Phase 5: Compose the full path return compose_paths(intra_paths, inter_path) function greedy_with_lookahead(tn: TN, lookahead: int) -> Path: path = [] remaining = tn.clone() while remaining.num_tensors > 1: best_cost = INFINITY best_pair = None // Evaluate all candidate contractions for (i, j) in remaining.candidate_pairs(): cost = contraction_cost(remaining, i, j) // Lookahead: estimate cost of subsequent contractions if lookahead > 0: simulated = remaining.simulate_contraction(i, j) future_cost = estimate_future_cost(simulated, lookahead - 1) cost += future_cost * DISCOUNT_FACTOR if cost < best_cost: best_cost = cost best_pair = (i, j) path.append(best_pair) remaining.contract(best_pair) return path ``` #### Community Detection via ruvector-mincut The `ruvector-mincut` crate provides graph partitioning that is directly applicable to contraction ordering: ```rust use ruvector_mincut::{partition, PartitionConfig}; fn partition_tensor_network(tn: &TensorNetwork) -> Vec> { let graph = tn.to_adjacency_graph(); let config = PartitionConfig { num_partitions: estimate_optimal_partitions(tn), balance_factor: 1.1, // Allow 10% imbalance minimize: Objective::EdgeCut, // Minimize inter-partition wires }; partition(&graph, &config) } ``` The edge cut directly corresponds to the bond dimension of the inter-community contraction, so minimizing edge cut minimizes the most expensive contraction step. ### 4. MPS (Matrix Product State) Mode For circuits with 1D-like connectivity (nearest-neighbor gates on a line), a Matrix Product State representation is more efficient than general tensor contraction. ``` A[1] -- A[2] -- A[3] -- ... -- A[n] | | | | phys_1 phys_2 phys_3 phys_n ``` Each site tensor A[i] has shape `[bond_left, physical, bond_right]` where: - `physical` = 2 (qubit dimension) - `bond_left`, `bond_right` = bond dimension chi | Bond dimension (chi) | Memory per site | Total memory (n qubits) | Approximation | |---|---|---|---| | 1 | 16 bytes | 16n bytes | Product state only | | 16 | 4 KiB | 4n KiB | Low entanglement | | 64 | 64 KiB | 64n KiB | Moderate entanglement | | 256 | 1 MiB | n MiB | High entanglement | | 1024 | 16 MiB | 16n MiB | Near exact for many circuits | **Truncation policy**: After each two-qubit gate, perform SVD on the updated bond. If the bond dimension exceeds `chi_max`, truncate the smallest singular values. Track the total discarded weight (sum of squared discarded singular values) as a fidelity estimate: ```rust pub struct MpsConfig { /// Maximum bond dimension. Truncation occurs above this. pub chi_max: usize, /// Minimum singular value to retain (relative to largest). pub svd_cutoff: f64, /// Accumulated truncation error (updated during simulation). pub fidelity_estimate: f64, } impl Default for MpsConfig { fn default() -> Self { Self { chi_max: 256, svd_cutoff: 1e-12, fidelity_estimate: 1.0, } } } ``` ### 5. Automatic Mode Selection The engine analyzes the circuit before execution to recommend a backend: ```rust pub enum RecommendedBackend { StateVector { reason: &'static str }, TensorNetwork { estimated_treewidth: usize, reason: &'static str }, Mps { estimated_max_bond: usize, reason: &'static str }, } pub fn recommend_backend(circuit: &QuantumCircuit) -> RecommendedBackend { let n = circuit.num_qubits(); let depth = circuit.depth(); let connectivity = circuit.connectivity_graph(); // Rule 1: Small circuits always use state vector if n <= 20 { return RecommendedBackend::StateVector { reason: "Small circuit; state vector is fastest below 20 qubits", }; } // Rule 2: Check for 1D connectivity (MPS candidate) if connectivity.max_degree() <= 2 && connectivity.is_path_graph() { let estimated_bond = 2_usize.pow(depth.min(20) as u32); return RecommendedBackend::Mps { estimated_max_bond: estimated_bond, reason: "1D nearest-neighbor connectivity detected", }; } // Rule 3: Estimate treewidth for general TN let estimated_tw = estimate_treewidth(&connectivity, depth); if estimated_tw < 25 && n > 25 { return RecommendedBackend::TensorNetwork { estimated_treewidth: estimated_tw, reason: "Low treewidth relative to qubit count", }; } // Rule 4: Check memory feasibility for state vector let sv_memory = 16 * (1_usize << n); // bytes let available = estimate_available_memory(); if sv_memory > available { // Force TN even if treewidth is high -- at least it has a chance return RecommendedBackend::TensorNetwork { estimated_treewidth: estimated_tw, reason: "State vector exceeds available memory; TN is only option", }; } RecommendedBackend::StateVector { reason: "High treewidth circuit; state vector is more efficient", } } ``` ### 6. When Tensor Networks Win vs Lose **Tensor networks win when:** | Scenario | Why TN wins | Example | |---|---|---| | Shallow circuits on many qubits | Treewidth ~ depth, not n | 50-qubit depth-4 QAOA | | Sparse graph connectivity | Low treewidth from graph structure | MaxCut on 3-regular graph | | Separate registers | Independent contractions | n/2 Bell pairs | | Near-Clifford | Stabilizer + few non-Clifford gates | Clifford + 5 T gates | | Amplitude computation | Contract to single output, not full state | Sampling one bitstring | **Tensor networks lose when:** | Scenario | Why TN loses | Fallback | |---|---|---| | Deep random circuits | Treewidth ~ n | State vector (if n <= 30) | | All-to-all connectivity | No structure to exploit | State vector | | Full state tomography needed | Must contract once per amplitude | State vector | | Very small circuits (n < 20) | Overhead exceeds state vector | State vector | | High-fidelity MPS needed | Bond dimension grows exponentially | State vector or exact TN | ### 7. Example: 50-Qubit Shallow QAOA Consider QAOA depth p=1 on a 50-node 3-regular graph: ``` Circuit structure: - 50 qubits, initialized to |+> - 75 ZZ gates (one per edge), parameterized by gamma - 50 Rx gates, parameterized by beta - Total: 125 + 50 = 175 gates - Circuit depth: 4 (H layer, ZZ layer (3-colorable), Rx layer, measure) Graph treewidth of 3-regular graph: typically 8-15 Tensor network contraction: - Community detection finds ~5-8 communities of 6-10 nodes - Intra-community contraction: O(2^10) ~ 1024 per community - Inter-community bonds: ~15 edges cut - Effective contraction complexity: O(2^15) = 32768 - Compare to state vector: O(2^50) = 1.1 * 10^15 Memory comparison: - State vector: 2^50 * 16 bytes = 16 PiB (impossible) - Tensor network: ~100 MiB working memory - Speedup factor: practically infinite (feasible vs infeasible) ``` ``` Contraction Diagram (simplified): Community A Community B Community C [q0-q9] [q10-q19] [q20-q29] | | | +--- bond=2^3 ----+---- bond=2^4 -----+ | Community D Community E [q30-q39] [q40-q49] | | +--- bond=2^3 ----+ Peak intermediate tensor: 2^15 elements = 512 KiB ``` ### 8. Integration with State Vector Backend Both backends implement the same trait: ```rust pub trait SimulationBackend { /// Execute the circuit and return measurement results. fn execute( &self, circuit: &QuantumCircuit, shots: usize, config: &SimulationConfig, ) -> Result; /// Compute expectation value of an observable. fn expectation_value( &self, circuit: &QuantumCircuit, observable: &Observable, config: &SimulationConfig, ) -> Result; /// Return the backend name for logging. fn name(&self) -> &'static str; } ``` Users interact through `QuantumCircuit` and never need to know which backend is active: ```rust let circuit = QuantumCircuit::new(50) .h_all() .append_qaoa_layer(graph, gamma, beta) .measure_all(); // Automatic backend selection let result = ruqu::execute(&circuit, 1000)?; // -> Internally selects TensorNetwork backend due to n=50, low treewidth // Or explicit backend override let result = ruqu::execute_with_backend( &circuit, 1000, Backend::TensorNetwork(TnConfig::default()), )?; ``` ### 9. Future: ruvector-mincut Integration for Contraction Ordering The `ruvector-mincut` crate currently solves balanced graph partitioning for vector index sharding. The same algorithm directly applies to tensor network contraction ordering via the following correspondence: | Graph partitioning concept | TN contraction concept | |---|---| | Vertex | Tensor | | Edge weight | Bond dimension (log2) | | Partition | Contraction subtree | | Edge cut | Inter-partition bond cost | | Balanced partition | Balanced contraction tree | Phase 1 (this ADR): Use `ruvector-mincut` for community detection in contraction path optimization. Phase 2 (future): Extend `ruvector-mincut` with hypergraph partitioning for multi-index tensor contractions, enabling handling of higher-order tensor networks (e.g., PEPS for 2D circuits). ## Consequences ### Positive 1. **Dramatically expanded qubit range**: Shallow circuits on 40-60 qubits become tractable on commodity hardware. 2. **Surface code simulation**: Distance-7 surface codes (49 data + 48 ancilla = 97 qubits) can be simulated for decoder validation using MPS (the circuit is geometrically local). 3. **Unified interface**: Users write circuits once; backend selection is automatic. 4. **Synergy with ruvector-mincut**: Leverages existing graph partitioning investment. 5. **Complementary to state vector**: Each backend covers the other's weakness. ### Negative 1. **Implementation complexity**: Tensor contraction, SVD truncation, and path optimization are non-trivial to implement correctly and efficiently. 2. **Approximation risk**: MPS truncation introduces controlled but nonzero error. Users must understand fidelity estimates. 3. **Compilation time**: The `ndarray` and `petgraph` dependencies add to compile time when the feature is enabled. 4. **Testing surface**: Two backends doubles the testing matrix for correctness validation. 5. **Performance unpredictability**: Contraction cost depends on circuit structure in ways that are hard to predict without running the path optimizer. ### Risks and Mitigations | Risk | Likelihood | Impact | Mitigation | |---|---|---|---| | Path optimizer finds poor ordering | Medium | High cost | Multiple heuristics + timeout fallback to greedy | | MPS fidelity silently degrades | Medium | Incorrect results | Track discarded weight; warn if fidelity < 0.99 | | Feature interaction bugs | Low | Incorrect results | Shared test suite: both backends must agree on small circuits | | Memory spike during contraction | Medium | OOM | Pre-estimate peak intermediate tensor size; abort if too large | ## References - QuantRS2 tensor network implementation: internal reference - Markov & Shi, "Simulating Quantum Computation by Contracting Tensor Networks" (2008) - Gray & Kourtis, "Hyper-optimized tensor network contraction" (2021) -- cotengra - Schollwock, "The density-matrix renormalization group in the age of matrix product states" (2011) - ADR-QE-001: Core Engine Architecture (state vector backend) - ADR-QE-005: WASM Compilation Target - `ruvector-mincut` crate documentation - ADR-014: Coherence Engine (graph partitioning reuse)