Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
480
docs/adr/quantum-engine/ADR-QE-009-tensor-network-evaluation.md
Normal file
480
docs/adr/quantum-engine/ADR-QE-009-tensor-network-evaluation.md
Normal file
@@ -0,0 +1,480 @@
|
||||
# ADR-QE-009: Tensor Network Evaluation Mode
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Full state-vector simulation stores all 2^n complex amplitudes explicitly, yielding
|
||||
O(2^n) memory and O(G * 2^n) time for G gates. At n=30 this is 16 GiB; at n=40 it
|
||||
exceeds 16 TiB. Many practically interesting circuits, however, contain limited
|
||||
entanglement:
|
||||
|
||||
| Circuit family | Entanglement structure | Treewidth |
|
||||
|---|---|---|
|
||||
| Shallow QAOA on sparse graphs | Bounded by graph degree | Low (often < 20) |
|
||||
| Separate-register circuits | Disjoint qubit subsets | Sum of sub-widths |
|
||||
| Near-Clifford circuits | Stabilizer + few T gates | Depends on T count |
|
||||
| 1D brickwork (finite depth) | Area-law entanglement | O(depth) |
|
||||
| Random deep circuits (all-to-all) | Volume-law entanglement | O(n) -- no gain |
|
||||
|
||||
For the first four families, tensor network (TN) methods can trade increased
|
||||
computation for drastically reduced memory by representing each gate as a tensor and
|
||||
contracting the resulting network in an optimized order. The contraction cost scales
|
||||
exponentially in the *treewidth* of the circuit's line graph rather than in the total
|
||||
qubit count.
|
||||
|
||||
QuantRS2 (the Rust quantum simulation reference) demonstrated tensor network
|
||||
contraction for circuits up to 60 qubits on commodity hardware when treewidth
|
||||
remained below ~25. ruVector's existing `ruvector-mincut` crate already solves graph
|
||||
partitioning problems that are structurally identical to contraction-order
|
||||
optimization, providing a natural integration point.
|
||||
|
||||
The ruQu engine needs this capability to support:
|
||||
|
||||
1. Surface code simulations at distance d >= 7 (49+ data qubits) for decoder
|
||||
validation, where the syndrome extraction circuit is shallow and geometrically
|
||||
local.
|
||||
2. Variational algorithm prototyping (VQE, QAOA) on graphs larger than 30 nodes.
|
||||
3. Hybrid workflows where part of the circuit is simulated via state vector and part
|
||||
via tensor contraction.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Feature-Gated Backend
|
||||
|
||||
Tensor network evaluation is implemented as an optional backend behind the
|
||||
`tensor-network` feature flag in `ruqu-core`:
|
||||
|
||||
```toml
|
||||
# ruqu-core/Cargo.toml
|
||||
[features]
|
||||
default = ["state-vector"]
|
||||
state-vector = []
|
||||
tensor-network = ["dep:ndarray", "dep:petgraph"]
|
||||
all-backends = ["state-vector", "tensor-network"]
|
||||
```
|
||||
|
||||
When both backends are compiled in, the engine selects the backend at runtime based
|
||||
on circuit analysis (see Section 4 below).
|
||||
|
||||
### 2. Tensor Representation
|
||||
|
||||
Every gate becomes a tensor connecting the qubit wire indices it acts on:
|
||||
|
||||
| Gate type | Tensor rank | Shape | Example |
|
||||
|---|---|---|---|
|
||||
| Single-qubit (H, X, Rz, ...) | 2 | [2, 2] | Input wire -> output wire |
|
||||
| Two-qubit (CNOT, CZ, ...) | 4 | [2, 2, 2, 2] | Two input wires -> two output wires |
|
||||
| Three-qubit (Toffoli) | 6 | [2, 2, 2, 2, 2, 2] | Three input -> three output |
|
||||
| Measurement projector | 2 | [2, 2] | Diagonal in computational basis |
|
||||
| Initial state |0> | 1 | [2] | Single output wire |
|
||||
|
||||
The circuit is converted into a tensor network graph where:
|
||||
- Each tensor is a node.
|
||||
- Each shared index (qubit wire between consecutive gates) is an edge.
|
||||
- Open indices represent initial states and final measurement outcomes.
|
||||
|
||||
```
|
||||
|0>---[H]---[CNOT_ctrl]---[Rz]---<meas>
|
||||
|
|
||||
|0>-----------[CNOT_tgt]---------<meas>
|
||||
```
|
||||
|
||||
Becomes:
|
||||
|
||||
```
|
||||
Node: init_0 (rank 1)
|
||||
|
|
||||
Node: H_0 (rank 2)
|
||||
|
|
||||
Node: CNOT_01 (rank 4)
|
||||
/ \
|
||||
| Node: Rz_0 (rank 2)
|
||||
| |
|
||||
| Node: meas_0 (rank 2)
|
||||
|
|
||||
Node: init_1 (rank 1)
|
||||
... (connected via CNOT shared index)
|
||||
Node: meas_1 (rank 2)
|
||||
```
|
||||
|
||||
### 3. Contraction Strategy
|
||||
|
||||
Contraction order determines whether the computation is tractable. The cost of
|
||||
contracting two tensors is the product of the dimensions of all indices involved.
|
||||
Finding the optimal contraction order is NP-hard (equivalent to finding minimum
|
||||
treewidth), so we use heuristics.
|
||||
|
||||
#### Contraction Path Optimization Pseudocode
|
||||
|
||||
```
|
||||
function find_contraction_path(tensor_network: TN) -> ContractionPath:
|
||||
// Phase 1: Simplify the network
|
||||
apply_trivial_contractions(tensor_network) // rank-1 tensors, diagonal pairs
|
||||
|
||||
// Phase 2: Detect community structure
|
||||
communities = detect_communities(tensor_network.graph)
|
||||
|
||||
// Phase 3: Contract within communities first (small subproblems)
|
||||
intra_paths = []
|
||||
for community in communities:
|
||||
subgraph = tensor_network.subgraph(community)
|
||||
if subgraph.num_tensors <= 20:
|
||||
// Exact dynamic programming for small subgraphs
|
||||
path = optimal_einsum_dp(subgraph)
|
||||
else:
|
||||
// Greedy with lookahead for larger subgraphs
|
||||
path = greedy_with_lookahead(subgraph, lookahead=2)
|
||||
intra_paths.append(path)
|
||||
|
||||
// Phase 4: Contract inter-community edges
|
||||
// Each community is now a single large tensor
|
||||
meta_graph = contract_communities(tensor_network, intra_paths)
|
||||
inter_path = greedy_with_lookahead(meta_graph, lookahead=3)
|
||||
|
||||
// Phase 5: Compose the full path
|
||||
return compose_paths(intra_paths, inter_path)
|
||||
|
||||
|
||||
function greedy_with_lookahead(tn: TN, lookahead: int) -> Path:
|
||||
path = []
|
||||
remaining = tn.clone()
|
||||
|
||||
while remaining.num_tensors > 1:
|
||||
best_cost = INFINITY
|
||||
best_pair = None
|
||||
|
||||
// Evaluate all candidate contractions
|
||||
for (i, j) in remaining.candidate_pairs():
|
||||
cost = contraction_cost(remaining, i, j)
|
||||
|
||||
// Lookahead: estimate cost of subsequent contractions
|
||||
if lookahead > 0:
|
||||
simulated = remaining.simulate_contraction(i, j)
|
||||
future_cost = estimate_future_cost(simulated, lookahead - 1)
|
||||
cost += future_cost * DISCOUNT_FACTOR
|
||||
|
||||
if cost < best_cost:
|
||||
best_cost = cost
|
||||
best_pair = (i, j)
|
||||
|
||||
path.append(best_pair)
|
||||
remaining.contract(best_pair)
|
||||
|
||||
return path
|
||||
```
|
||||
|
||||
#### Community Detection via ruvector-mincut
|
||||
|
||||
The `ruvector-mincut` crate provides graph partitioning that is directly applicable
|
||||
to contraction ordering:
|
||||
|
||||
```rust
|
||||
use ruvector_mincut::{partition, PartitionConfig};
|
||||
|
||||
fn partition_tensor_network(tn: &TensorNetwork) -> Vec<Vec<TensorId>> {
|
||||
let graph = tn.to_adjacency_graph();
|
||||
let config = PartitionConfig {
|
||||
num_partitions: estimate_optimal_partitions(tn),
|
||||
balance_factor: 1.1, // Allow 10% imbalance
|
||||
minimize: Objective::EdgeCut, // Minimize inter-partition wires
|
||||
};
|
||||
partition(&graph, &config)
|
||||
}
|
||||
```
|
||||
|
||||
The edge cut directly corresponds to the bond dimension of the inter-community
|
||||
contraction, so minimizing edge cut minimizes the most expensive contraction step.
|
||||
|
||||
### 4. MPS (Matrix Product State) Mode
|
||||
|
||||
For circuits with 1D-like connectivity (nearest-neighbor gates on a line), a Matrix
|
||||
Product State representation is more efficient than general tensor contraction.
|
||||
|
||||
```
|
||||
A[1] -- A[2] -- A[3] -- ... -- A[n]
|
||||
| | | |
|
||||
phys_1 phys_2 phys_3 phys_n
|
||||
```
|
||||
|
||||
Each site tensor A[i] has shape `[bond_left, physical, bond_right]` where:
|
||||
- `physical` = 2 (qubit dimension)
|
||||
- `bond_left`, `bond_right` = bond dimension chi
|
||||
|
||||
| Bond dimension (chi) | Memory per site | Total memory (n qubits) | Approximation |
|
||||
|---|---|---|---|
|
||||
| 1 | 16 bytes | 16n bytes | Product state only |
|
||||
| 16 | 4 KiB | 4n KiB | Low entanglement |
|
||||
| 64 | 64 KiB | 64n KiB | Moderate entanglement |
|
||||
| 256 | 1 MiB | n MiB | High entanglement |
|
||||
| 1024 | 16 MiB | 16n MiB | Near exact for many circuits |
|
||||
|
||||
**Truncation policy**: After each two-qubit gate, perform SVD on the updated bond.
|
||||
If the bond dimension exceeds `chi_max`, truncate the smallest singular values.
|
||||
Track the total discarded weight (sum of squared discarded singular values) as a
|
||||
fidelity estimate:
|
||||
|
||||
```rust
|
||||
pub struct MpsConfig {
|
||||
/// Maximum bond dimension. Truncation occurs above this.
|
||||
pub chi_max: usize,
|
||||
/// Minimum singular value to retain (relative to largest).
|
||||
pub svd_cutoff: f64,
|
||||
/// Accumulated truncation error (updated during simulation).
|
||||
pub fidelity_estimate: f64,
|
||||
}
|
||||
|
||||
impl Default for MpsConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
chi_max: 256,
|
||||
svd_cutoff: 1e-12,
|
||||
fidelity_estimate: 1.0,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Automatic Mode Selection
|
||||
|
||||
The engine analyzes the circuit before execution to recommend a backend:
|
||||
|
||||
```rust
|
||||
pub enum RecommendedBackend {
|
||||
StateVector { reason: &'static str },
|
||||
TensorNetwork { estimated_treewidth: usize, reason: &'static str },
|
||||
Mps { estimated_max_bond: usize, reason: &'static str },
|
||||
}
|
||||
|
||||
pub fn recommend_backend(circuit: &QuantumCircuit) -> RecommendedBackend {
|
||||
let n = circuit.num_qubits();
|
||||
let depth = circuit.depth();
|
||||
let connectivity = circuit.connectivity_graph();
|
||||
|
||||
// Rule 1: Small circuits always use state vector
|
||||
if n <= 20 {
|
||||
return RecommendedBackend::StateVector {
|
||||
reason: "Small circuit; state vector is fastest below 20 qubits",
|
||||
};
|
||||
}
|
||||
|
||||
// Rule 2: Check for 1D connectivity (MPS candidate)
|
||||
if connectivity.max_degree() <= 2 && connectivity.is_path_graph() {
|
||||
let estimated_bond = 2_usize.pow(depth.min(20) as u32);
|
||||
return RecommendedBackend::Mps {
|
||||
estimated_max_bond: estimated_bond,
|
||||
reason: "1D nearest-neighbor connectivity detected",
|
||||
};
|
||||
}
|
||||
|
||||
// Rule 3: Estimate treewidth for general TN
|
||||
let estimated_tw = estimate_treewidth(&connectivity, depth);
|
||||
if estimated_tw < 25 && n > 25 {
|
||||
return RecommendedBackend::TensorNetwork {
|
||||
estimated_treewidth: estimated_tw,
|
||||
reason: "Low treewidth relative to qubit count",
|
||||
};
|
||||
}
|
||||
|
||||
// Rule 4: Check memory feasibility for state vector
|
||||
let sv_memory = 16 * (1_usize << n); // bytes
|
||||
let available = estimate_available_memory();
|
||||
if sv_memory > available {
|
||||
// Force TN even if treewidth is high -- at least it has a chance
|
||||
return RecommendedBackend::TensorNetwork {
|
||||
estimated_treewidth: estimated_tw,
|
||||
reason: "State vector exceeds available memory; TN is only option",
|
||||
};
|
||||
}
|
||||
|
||||
RecommendedBackend::StateVector {
|
||||
reason: "High treewidth circuit; state vector is more efficient",
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. When Tensor Networks Win vs Lose
|
||||
|
||||
**Tensor networks win when:**
|
||||
|
||||
| Scenario | Why TN wins | Example |
|
||||
|---|---|---|
|
||||
| Shallow circuits on many qubits | Treewidth ~ depth, not n | 50-qubit depth-4 QAOA |
|
||||
| Sparse graph connectivity | Low treewidth from graph structure | MaxCut on 3-regular graph |
|
||||
| Separate registers | Independent contractions | n/2 Bell pairs |
|
||||
| Near-Clifford | Stabilizer + few non-Clifford gates | Clifford + 5 T gates |
|
||||
| Amplitude computation | Contract to single output, not full state | Sampling one bitstring |
|
||||
|
||||
**Tensor networks lose when:**
|
||||
|
||||
| Scenario | Why TN loses | Fallback |
|
||||
|---|---|---|
|
||||
| Deep random circuits | Treewidth ~ n | State vector (if n <= 30) |
|
||||
| All-to-all connectivity | No structure to exploit | State vector |
|
||||
| Full state tomography needed | Must contract once per amplitude | State vector |
|
||||
| Very small circuits (n < 20) | Overhead exceeds state vector | State vector |
|
||||
| High-fidelity MPS needed | Bond dimension grows exponentially | State vector or exact TN |
|
||||
|
||||
### 7. Example: 50-Qubit Shallow QAOA
|
||||
|
||||
Consider QAOA depth p=1 on a 50-node 3-regular graph:
|
||||
|
||||
```
|
||||
Circuit structure:
|
||||
- 50 qubits, initialized to |+>
|
||||
- 75 ZZ gates (one per edge), parameterized by gamma
|
||||
- 50 Rx gates, parameterized by beta
|
||||
- Total: 125 + 50 = 175 gates
|
||||
- Circuit depth: 4 (H layer, ZZ layer (3-colorable), Rx layer, measure)
|
||||
|
||||
Graph treewidth of 3-regular graph: typically 8-15
|
||||
|
||||
Tensor network contraction:
|
||||
- Community detection finds ~5-8 communities of 6-10 nodes
|
||||
- Intra-community contraction: O(2^10) ~ 1024 per community
|
||||
- Inter-community bonds: ~15 edges cut
|
||||
- Effective contraction complexity: O(2^15) = 32768
|
||||
- Compare to state vector: O(2^50) = 1.1 * 10^15
|
||||
|
||||
Memory comparison:
|
||||
- State vector: 2^50 * 16 bytes = 16 PiB (impossible)
|
||||
- Tensor network: ~100 MiB working memory
|
||||
- Speedup factor: practically infinite (feasible vs infeasible)
|
||||
```
|
||||
|
||||
```
|
||||
Contraction Diagram (simplified):
|
||||
|
||||
Community A Community B Community C
|
||||
[q0-q9] [q10-q19] [q20-q29]
|
||||
| | |
|
||||
+--- bond=2^3 ----+---- bond=2^4 -----+
|
||||
|
|
||||
Community D Community E
|
||||
[q30-q39] [q40-q49]
|
||||
| |
|
||||
+--- bond=2^3 ----+
|
||||
|
||||
Peak intermediate tensor: 2^15 elements = 512 KiB
|
||||
```
|
||||
|
||||
### 8. Integration with State Vector Backend
|
||||
|
||||
Both backends implement the same trait:
|
||||
|
||||
```rust
|
||||
pub trait SimulationBackend {
|
||||
/// Execute the circuit and return measurement results.
|
||||
fn execute(
|
||||
&self,
|
||||
circuit: &QuantumCircuit,
|
||||
shots: usize,
|
||||
config: &SimulationConfig,
|
||||
) -> Result<SimulationResult, SimulationError>;
|
||||
|
||||
/// Compute expectation value of an observable.
|
||||
fn expectation_value(
|
||||
&self,
|
||||
circuit: &QuantumCircuit,
|
||||
observable: &Observable,
|
||||
config: &SimulationConfig,
|
||||
) -> Result<f64, SimulationError>;
|
||||
|
||||
/// Return the backend name for logging.
|
||||
fn name(&self) -> &'static str;
|
||||
}
|
||||
```
|
||||
|
||||
Users interact through `QuantumCircuit` and never need to know which backend is
|
||||
active:
|
||||
|
||||
```rust
|
||||
let circuit = QuantumCircuit::new(50)
|
||||
.h_all()
|
||||
.append_qaoa_layer(graph, gamma, beta)
|
||||
.measure_all();
|
||||
|
||||
// Automatic backend selection
|
||||
let result = ruqu::execute(&circuit, 1000)?;
|
||||
// -> Internally selects TensorNetwork backend due to n=50, low treewidth
|
||||
|
||||
// Or explicit backend override
|
||||
let result = ruqu::execute_with_backend(
|
||||
&circuit,
|
||||
1000,
|
||||
Backend::TensorNetwork(TnConfig::default()),
|
||||
)?;
|
||||
```
|
||||
|
||||
### 9. Future: ruvector-mincut Integration for Contraction Ordering
|
||||
|
||||
The `ruvector-mincut` crate currently solves balanced graph partitioning for vector
|
||||
index sharding. The same algorithm directly applies to tensor network contraction
|
||||
ordering via the following correspondence:
|
||||
|
||||
| Graph partitioning concept | TN contraction concept |
|
||||
|---|---|
|
||||
| Vertex | Tensor |
|
||||
| Edge weight | Bond dimension (log2) |
|
||||
| Partition | Contraction subtree |
|
||||
| Edge cut | Inter-partition bond cost |
|
||||
| Balanced partition | Balanced contraction tree |
|
||||
|
||||
Phase 1 (this ADR): Use `ruvector-mincut` for community detection in contraction
|
||||
path optimization.
|
||||
|
||||
Phase 2 (future): Extend `ruvector-mincut` with hypergraph partitioning for
|
||||
multi-index tensor contractions, enabling handling of higher-order tensor networks
|
||||
(e.g., PEPS for 2D circuits).
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Dramatically expanded qubit range**: Shallow circuits on 40-60 qubits become
|
||||
tractable on commodity hardware.
|
||||
2. **Surface code simulation**: Distance-7 surface codes (49 data + 48 ancilla = 97
|
||||
qubits) can be simulated for decoder validation using MPS (the circuit is
|
||||
geometrically local).
|
||||
3. **Unified interface**: Users write circuits once; backend selection is automatic.
|
||||
4. **Synergy with ruvector-mincut**: Leverages existing graph partitioning
|
||||
investment.
|
||||
5. **Complementary to state vector**: Each backend covers the other's weakness.
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Implementation complexity**: Tensor contraction, SVD truncation, and path
|
||||
optimization are non-trivial to implement correctly and efficiently.
|
||||
2. **Approximation risk**: MPS truncation introduces controlled but nonzero error.
|
||||
Users must understand fidelity estimates.
|
||||
3. **Compilation time**: The `ndarray` and `petgraph` dependencies add to compile
|
||||
time when the feature is enabled.
|
||||
4. **Testing surface**: Two backends doubles the testing matrix for correctness
|
||||
validation.
|
||||
5. **Performance unpredictability**: Contraction cost depends on circuit structure
|
||||
in ways that are hard to predict without running the path optimizer.
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Path optimizer finds poor ordering | Medium | High cost | Multiple heuristics + timeout fallback to greedy |
|
||||
| MPS fidelity silently degrades | Medium | Incorrect results | Track discarded weight; warn if fidelity < 0.99 |
|
||||
| Feature interaction bugs | Low | Incorrect results | Shared test suite: both backends must agree on small circuits |
|
||||
| Memory spike during contraction | Medium | OOM | Pre-estimate peak intermediate tensor size; abort if too large |
|
||||
|
||||
## References
|
||||
|
||||
- QuantRS2 tensor network implementation: internal reference
|
||||
- Markov & Shi, "Simulating Quantum Computation by Contracting Tensor Networks" (2008)
|
||||
- Gray & Kourtis, "Hyper-optimized tensor network contraction" (2021) -- cotengra
|
||||
- Schollwock, "The density-matrix renormalization group in the age of matrix product states" (2011)
|
||||
- ADR-QE-001: Core Engine Architecture (state vector backend)
|
||||
- ADR-QE-005: WASM Compilation Target
|
||||
- `ruvector-mincut` crate documentation
|
||||
- ADR-014: Coherence Engine (graph partitioning reuse)
|
||||
Reference in New Issue
Block a user