git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
481 lines
17 KiB
Markdown
481 lines
17 KiB
Markdown
# ADR-QE-009: Tensor Network Evaluation Mode
|
|
|
|
**Status**: Proposed
|
|
**Date**: 2026-02-06
|
|
**Authors**: ruv.io, RuVector Team
|
|
**Deciders**: Architecture Review Board
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
Full state-vector simulation stores all 2^n complex amplitudes explicitly, yielding
|
|
O(2^n) memory and O(G * 2^n) time for G gates. At n=30 this is 16 GiB; at n=40 it
|
|
exceeds 16 TiB. Many practically interesting circuits, however, contain limited
|
|
entanglement:
|
|
|
|
| Circuit family | Entanglement structure | Treewidth |
|
|
|---|---|---|
|
|
| Shallow QAOA on sparse graphs | Bounded by graph degree | Low (often < 20) |
|
|
| Separate-register circuits | Disjoint qubit subsets | Sum of sub-widths |
|
|
| Near-Clifford circuits | Stabilizer + few T gates | Depends on T count |
|
|
| 1D brickwork (finite depth) | Area-law entanglement | O(depth) |
|
|
| Random deep circuits (all-to-all) | Volume-law entanglement | O(n) -- no gain |
|
|
|
|
For the first four families, tensor network (TN) methods can trade increased
|
|
computation for drastically reduced memory by representing each gate as a tensor and
|
|
contracting the resulting network in an optimized order. The contraction cost scales
|
|
exponentially in the *treewidth* of the circuit's line graph rather than in the total
|
|
qubit count.
|
|
|
|
QuantRS2 (the Rust quantum simulation reference) demonstrated tensor network
|
|
contraction for circuits up to 60 qubits on commodity hardware when treewidth
|
|
remained below ~25. ruVector's existing `ruvector-mincut` crate already solves graph
|
|
partitioning problems that are structurally identical to contraction-order
|
|
optimization, providing a natural integration point.
|
|
|
|
The ruQu engine needs this capability to support:
|
|
|
|
1. Surface code simulations at distance d >= 7 (49+ data qubits) for decoder
|
|
validation, where the syndrome extraction circuit is shallow and geometrically
|
|
local.
|
|
2. Variational algorithm prototyping (VQE, QAOA) on graphs larger than 30 nodes.
|
|
3. Hybrid workflows where part of the circuit is simulated via state vector and part
|
|
via tensor contraction.
|
|
|
|
## Decision
|
|
|
|
### 1. Feature-Gated Backend
|
|
|
|
Tensor network evaluation is implemented as an optional backend behind the
|
|
`tensor-network` feature flag in `ruqu-core`:
|
|
|
|
```toml
|
|
# ruqu-core/Cargo.toml
|
|
[features]
|
|
default = ["state-vector"]
|
|
state-vector = []
|
|
tensor-network = ["dep:ndarray", "dep:petgraph"]
|
|
all-backends = ["state-vector", "tensor-network"]
|
|
```
|
|
|
|
When both backends are compiled in, the engine selects the backend at runtime based
|
|
on circuit analysis (see Section 4 below).
|
|
|
|
### 2. Tensor Representation
|
|
|
|
Every gate becomes a tensor connecting the qubit wire indices it acts on:
|
|
|
|
| Gate type | Tensor rank | Shape | Example |
|
|
|---|---|---|---|
|
|
| Single-qubit (H, X, Rz, ...) | 2 | [2, 2] | Input wire -> output wire |
|
|
| Two-qubit (CNOT, CZ, ...) | 4 | [2, 2, 2, 2] | Two input wires -> two output wires |
|
|
| Three-qubit (Toffoli) | 6 | [2, 2, 2, 2, 2, 2] | Three input -> three output |
|
|
| Measurement projector | 2 | [2, 2] | Diagonal in computational basis |
|
|
| Initial state |0> | 1 | [2] | Single output wire |
|
|
|
|
The circuit is converted into a tensor network graph where:
|
|
- Each tensor is a node.
|
|
- Each shared index (qubit wire between consecutive gates) is an edge.
|
|
- Open indices represent initial states and final measurement outcomes.
|
|
|
|
```
|
|
|0>---[H]---[CNOT_ctrl]---[Rz]---<meas>
|
|
|
|
|
|0>-----------[CNOT_tgt]---------<meas>
|
|
```
|
|
|
|
Becomes:
|
|
|
|
```
|
|
Node: init_0 (rank 1)
|
|
|
|
|
Node: H_0 (rank 2)
|
|
|
|
|
Node: CNOT_01 (rank 4)
|
|
/ \
|
|
| Node: Rz_0 (rank 2)
|
|
| |
|
|
| Node: meas_0 (rank 2)
|
|
|
|
|
Node: init_1 (rank 1)
|
|
... (connected via CNOT shared index)
|
|
Node: meas_1 (rank 2)
|
|
```
|
|
|
|
### 3. Contraction Strategy
|
|
|
|
Contraction order determines whether the computation is tractable. The cost of
|
|
contracting two tensors is the product of the dimensions of all indices involved.
|
|
Finding the optimal contraction order is NP-hard (equivalent to finding minimum
|
|
treewidth), so we use heuristics.
|
|
|
|
#### Contraction Path Optimization Pseudocode
|
|
|
|
```
|
|
function find_contraction_path(tensor_network: TN) -> ContractionPath:
|
|
// Phase 1: Simplify the network
|
|
apply_trivial_contractions(tensor_network) // rank-1 tensors, diagonal pairs
|
|
|
|
// Phase 2: Detect community structure
|
|
communities = detect_communities(tensor_network.graph)
|
|
|
|
// Phase 3: Contract within communities first (small subproblems)
|
|
intra_paths = []
|
|
for community in communities:
|
|
subgraph = tensor_network.subgraph(community)
|
|
if subgraph.num_tensors <= 20:
|
|
// Exact dynamic programming for small subgraphs
|
|
path = optimal_einsum_dp(subgraph)
|
|
else:
|
|
// Greedy with lookahead for larger subgraphs
|
|
path = greedy_with_lookahead(subgraph, lookahead=2)
|
|
intra_paths.append(path)
|
|
|
|
// Phase 4: Contract inter-community edges
|
|
// Each community is now a single large tensor
|
|
meta_graph = contract_communities(tensor_network, intra_paths)
|
|
inter_path = greedy_with_lookahead(meta_graph, lookahead=3)
|
|
|
|
// Phase 5: Compose the full path
|
|
return compose_paths(intra_paths, inter_path)
|
|
|
|
|
|
function greedy_with_lookahead(tn: TN, lookahead: int) -> Path:
|
|
path = []
|
|
remaining = tn.clone()
|
|
|
|
while remaining.num_tensors > 1:
|
|
best_cost = INFINITY
|
|
best_pair = None
|
|
|
|
// Evaluate all candidate contractions
|
|
for (i, j) in remaining.candidate_pairs():
|
|
cost = contraction_cost(remaining, i, j)
|
|
|
|
// Lookahead: estimate cost of subsequent contractions
|
|
if lookahead > 0:
|
|
simulated = remaining.simulate_contraction(i, j)
|
|
future_cost = estimate_future_cost(simulated, lookahead - 1)
|
|
cost += future_cost * DISCOUNT_FACTOR
|
|
|
|
if cost < best_cost:
|
|
best_cost = cost
|
|
best_pair = (i, j)
|
|
|
|
path.append(best_pair)
|
|
remaining.contract(best_pair)
|
|
|
|
return path
|
|
```
|
|
|
|
#### Community Detection via ruvector-mincut
|
|
|
|
The `ruvector-mincut` crate provides graph partitioning that is directly applicable
|
|
to contraction ordering:
|
|
|
|
```rust
|
|
use ruvector_mincut::{partition, PartitionConfig};
|
|
|
|
fn partition_tensor_network(tn: &TensorNetwork) -> Vec<Vec<TensorId>> {
|
|
let graph = tn.to_adjacency_graph();
|
|
let config = PartitionConfig {
|
|
num_partitions: estimate_optimal_partitions(tn),
|
|
balance_factor: 1.1, // Allow 10% imbalance
|
|
minimize: Objective::EdgeCut, // Minimize inter-partition wires
|
|
};
|
|
partition(&graph, &config)
|
|
}
|
|
```
|
|
|
|
The edge cut directly corresponds to the bond dimension of the inter-community
|
|
contraction, so minimizing edge cut minimizes the most expensive contraction step.
|
|
|
|
### 4. MPS (Matrix Product State) Mode
|
|
|
|
For circuits with 1D-like connectivity (nearest-neighbor gates on a line), a Matrix
|
|
Product State representation is more efficient than general tensor contraction.
|
|
|
|
```
|
|
A[1] -- A[2] -- A[3] -- ... -- A[n]
|
|
| | | |
|
|
phys_1 phys_2 phys_3 phys_n
|
|
```
|
|
|
|
Each site tensor A[i] has shape `[bond_left, physical, bond_right]` where:
|
|
- `physical` = 2 (qubit dimension)
|
|
- `bond_left`, `bond_right` = bond dimension chi
|
|
|
|
| Bond dimension (chi) | Memory per site | Total memory (n qubits) | Approximation |
|
|
|---|---|---|---|
|
|
| 1 | 16 bytes | 16n bytes | Product state only |
|
|
| 16 | 4 KiB | 4n KiB | Low entanglement |
|
|
| 64 | 64 KiB | 64n KiB | Moderate entanglement |
|
|
| 256 | 1 MiB | n MiB | High entanglement |
|
|
| 1024 | 16 MiB | 16n MiB | Near exact for many circuits |
|
|
|
|
**Truncation policy**: After each two-qubit gate, perform SVD on the updated bond.
|
|
If the bond dimension exceeds `chi_max`, truncate the smallest singular values.
|
|
Track the total discarded weight (sum of squared discarded singular values) as a
|
|
fidelity estimate:
|
|
|
|
```rust
|
|
pub struct MpsConfig {
|
|
/// Maximum bond dimension. Truncation occurs above this.
|
|
pub chi_max: usize,
|
|
/// Minimum singular value to retain (relative to largest).
|
|
pub svd_cutoff: f64,
|
|
/// Accumulated truncation error (updated during simulation).
|
|
pub fidelity_estimate: f64,
|
|
}
|
|
|
|
impl Default for MpsConfig {
|
|
fn default() -> Self {
|
|
Self {
|
|
chi_max: 256,
|
|
svd_cutoff: 1e-12,
|
|
fidelity_estimate: 1.0,
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5. Automatic Mode Selection
|
|
|
|
The engine analyzes the circuit before execution to recommend a backend:
|
|
|
|
```rust
|
|
pub enum RecommendedBackend {
|
|
StateVector { reason: &'static str },
|
|
TensorNetwork { estimated_treewidth: usize, reason: &'static str },
|
|
Mps { estimated_max_bond: usize, reason: &'static str },
|
|
}
|
|
|
|
pub fn recommend_backend(circuit: &QuantumCircuit) -> RecommendedBackend {
|
|
let n = circuit.num_qubits();
|
|
let depth = circuit.depth();
|
|
let connectivity = circuit.connectivity_graph();
|
|
|
|
// Rule 1: Small circuits always use state vector
|
|
if n <= 20 {
|
|
return RecommendedBackend::StateVector {
|
|
reason: "Small circuit; state vector is fastest below 20 qubits",
|
|
};
|
|
}
|
|
|
|
// Rule 2: Check for 1D connectivity (MPS candidate)
|
|
if connectivity.max_degree() <= 2 && connectivity.is_path_graph() {
|
|
let estimated_bond = 2_usize.pow(depth.min(20) as u32);
|
|
return RecommendedBackend::Mps {
|
|
estimated_max_bond: estimated_bond,
|
|
reason: "1D nearest-neighbor connectivity detected",
|
|
};
|
|
}
|
|
|
|
// Rule 3: Estimate treewidth for general TN
|
|
let estimated_tw = estimate_treewidth(&connectivity, depth);
|
|
if estimated_tw < 25 && n > 25 {
|
|
return RecommendedBackend::TensorNetwork {
|
|
estimated_treewidth: estimated_tw,
|
|
reason: "Low treewidth relative to qubit count",
|
|
};
|
|
}
|
|
|
|
// Rule 4: Check memory feasibility for state vector
|
|
let sv_memory = 16 * (1_usize << n); // bytes
|
|
let available = estimate_available_memory();
|
|
if sv_memory > available {
|
|
// Force TN even if treewidth is high -- at least it has a chance
|
|
return RecommendedBackend::TensorNetwork {
|
|
estimated_treewidth: estimated_tw,
|
|
reason: "State vector exceeds available memory; TN is only option",
|
|
};
|
|
}
|
|
|
|
RecommendedBackend::StateVector {
|
|
reason: "High treewidth circuit; state vector is more efficient",
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6. When Tensor Networks Win vs Lose
|
|
|
|
**Tensor networks win when:**
|
|
|
|
| Scenario | Why TN wins | Example |
|
|
|---|---|---|
|
|
| Shallow circuits on many qubits | Treewidth ~ depth, not n | 50-qubit depth-4 QAOA |
|
|
| Sparse graph connectivity | Low treewidth from graph structure | MaxCut on 3-regular graph |
|
|
| Separate registers | Independent contractions | n/2 Bell pairs |
|
|
| Near-Clifford | Stabilizer + few non-Clifford gates | Clifford + 5 T gates |
|
|
| Amplitude computation | Contract to single output, not full state | Sampling one bitstring |
|
|
|
|
**Tensor networks lose when:**
|
|
|
|
| Scenario | Why TN loses | Fallback |
|
|
|---|---|---|
|
|
| Deep random circuits | Treewidth ~ n | State vector (if n <= 30) |
|
|
| All-to-all connectivity | No structure to exploit | State vector |
|
|
| Full state tomography needed | Must contract once per amplitude | State vector |
|
|
| Very small circuits (n < 20) | Overhead exceeds state vector | State vector |
|
|
| High-fidelity MPS needed | Bond dimension grows exponentially | State vector or exact TN |
|
|
|
|
### 7. Example: 50-Qubit Shallow QAOA
|
|
|
|
Consider QAOA depth p=1 on a 50-node 3-regular graph:
|
|
|
|
```
|
|
Circuit structure:
|
|
- 50 qubits, initialized to |+>
|
|
- 75 ZZ gates (one per edge), parameterized by gamma
|
|
- 50 Rx gates, parameterized by beta
|
|
- Total: 125 + 50 = 175 gates
|
|
- Circuit depth: 4 (H layer, ZZ layer (3-colorable), Rx layer, measure)
|
|
|
|
Graph treewidth of 3-regular graph: typically 8-15
|
|
|
|
Tensor network contraction:
|
|
- Community detection finds ~5-8 communities of 6-10 nodes
|
|
- Intra-community contraction: O(2^10) ~ 1024 per community
|
|
- Inter-community bonds: ~15 edges cut
|
|
- Effective contraction complexity: O(2^15) = 32768
|
|
- Compare to state vector: O(2^50) = 1.1 * 10^15
|
|
|
|
Memory comparison:
|
|
- State vector: 2^50 * 16 bytes = 16 PiB (impossible)
|
|
- Tensor network: ~100 MiB working memory
|
|
- Speedup factor: practically infinite (feasible vs infeasible)
|
|
```
|
|
|
|
```
|
|
Contraction Diagram (simplified):
|
|
|
|
Community A Community B Community C
|
|
[q0-q9] [q10-q19] [q20-q29]
|
|
| | |
|
|
+--- bond=2^3 ----+---- bond=2^4 -----+
|
|
|
|
|
Community D Community E
|
|
[q30-q39] [q40-q49]
|
|
| |
|
|
+--- bond=2^3 ----+
|
|
|
|
Peak intermediate tensor: 2^15 elements = 512 KiB
|
|
```
|
|
|
|
### 8. Integration with State Vector Backend
|
|
|
|
Both backends implement the same trait:
|
|
|
|
```rust
|
|
pub trait SimulationBackend {
|
|
/// Execute the circuit and return measurement results.
|
|
fn execute(
|
|
&self,
|
|
circuit: &QuantumCircuit,
|
|
shots: usize,
|
|
config: &SimulationConfig,
|
|
) -> Result<SimulationResult, SimulationError>;
|
|
|
|
/// Compute expectation value of an observable.
|
|
fn expectation_value(
|
|
&self,
|
|
circuit: &QuantumCircuit,
|
|
observable: &Observable,
|
|
config: &SimulationConfig,
|
|
) -> Result<f64, SimulationError>;
|
|
|
|
/// Return the backend name for logging.
|
|
fn name(&self) -> &'static str;
|
|
}
|
|
```
|
|
|
|
Users interact through `QuantumCircuit` and never need to know which backend is
|
|
active:
|
|
|
|
```rust
|
|
let circuit = QuantumCircuit::new(50)
|
|
.h_all()
|
|
.append_qaoa_layer(graph, gamma, beta)
|
|
.measure_all();
|
|
|
|
// Automatic backend selection
|
|
let result = ruqu::execute(&circuit, 1000)?;
|
|
// -> Internally selects TensorNetwork backend due to n=50, low treewidth
|
|
|
|
// Or explicit backend override
|
|
let result = ruqu::execute_with_backend(
|
|
&circuit,
|
|
1000,
|
|
Backend::TensorNetwork(TnConfig::default()),
|
|
)?;
|
|
```
|
|
|
|
### 9. Future: ruvector-mincut Integration for Contraction Ordering
|
|
|
|
The `ruvector-mincut` crate currently solves balanced graph partitioning for vector
|
|
index sharding. The same algorithm directly applies to tensor network contraction
|
|
ordering via the following correspondence:
|
|
|
|
| Graph partitioning concept | TN contraction concept |
|
|
|---|---|
|
|
| Vertex | Tensor |
|
|
| Edge weight | Bond dimension (log2) |
|
|
| Partition | Contraction subtree |
|
|
| Edge cut | Inter-partition bond cost |
|
|
| Balanced partition | Balanced contraction tree |
|
|
|
|
Phase 1 (this ADR): Use `ruvector-mincut` for community detection in contraction
|
|
path optimization.
|
|
|
|
Phase 2 (future): Extend `ruvector-mincut` with hypergraph partitioning for
|
|
multi-index tensor contractions, enabling handling of higher-order tensor networks
|
|
(e.g., PEPS for 2D circuits).
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
1. **Dramatically expanded qubit range**: Shallow circuits on 40-60 qubits become
|
|
tractable on commodity hardware.
|
|
2. **Surface code simulation**: Distance-7 surface codes (49 data + 48 ancilla = 97
|
|
qubits) can be simulated for decoder validation using MPS (the circuit is
|
|
geometrically local).
|
|
3. **Unified interface**: Users write circuits once; backend selection is automatic.
|
|
4. **Synergy with ruvector-mincut**: Leverages existing graph partitioning
|
|
investment.
|
|
5. **Complementary to state vector**: Each backend covers the other's weakness.
|
|
|
|
### Negative
|
|
|
|
1. **Implementation complexity**: Tensor contraction, SVD truncation, and path
|
|
optimization are non-trivial to implement correctly and efficiently.
|
|
2. **Approximation risk**: MPS truncation introduces controlled but nonzero error.
|
|
Users must understand fidelity estimates.
|
|
3. **Compilation time**: The `ndarray` and `petgraph` dependencies add to compile
|
|
time when the feature is enabled.
|
|
4. **Testing surface**: Two backends doubles the testing matrix for correctness
|
|
validation.
|
|
5. **Performance unpredictability**: Contraction cost depends on circuit structure
|
|
in ways that are hard to predict without running the path optimizer.
|
|
|
|
### Risks and Mitigations
|
|
|
|
| Risk | Likelihood | Impact | Mitigation |
|
|
|---|---|---|---|
|
|
| Path optimizer finds poor ordering | Medium | High cost | Multiple heuristics + timeout fallback to greedy |
|
|
| MPS fidelity silently degrades | Medium | Incorrect results | Track discarded weight; warn if fidelity < 0.99 |
|
|
| Feature interaction bugs | Low | Incorrect results | Shared test suite: both backends must agree on small circuits |
|
|
| Memory spike during contraction | Medium | OOM | Pre-estimate peak intermediate tensor size; abort if too large |
|
|
|
|
## References
|
|
|
|
- QuantRS2 tensor network implementation: internal reference
|
|
- Markov & Shi, "Simulating Quantum Computation by Contracting Tensor Networks" (2008)
|
|
- Gray & Kourtis, "Hyper-optimized tensor network contraction" (2021) -- cotengra
|
|
- Schollwock, "The density-matrix renormalization group in the age of matrix product states" (2011)
|
|
- ADR-QE-001: Core Engine Architecture (state vector backend)
|
|
- ADR-QE-005: WASM Compilation Target
|
|
- `ruvector-mincut` crate documentation
|
|
- ADR-014: Coherence Engine (graph partitioning reuse)
|