Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
@@ -0,0 +1,688 @@
|
||||
# Physics Foundations of Thermodynamic Learning
|
||||
## Mathematical Foundations and Physical Principles
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
1. [Statistical Mechanics Primer](#1-statistical-mechanics-primer)
|
||||
2. [Information Theory and Physics](#2-information-theory-and-physics)
|
||||
3. [Landauer's Principle: Detailed Derivation](#3-landauers-principle-detailed-derivation)
|
||||
4. [Non-Equilibrium Thermodynamics](#4-non-equilibrium-thermodynamics)
|
||||
5. [Stochastic Thermodynamics](#5-stochastic-thermodynamics)
|
||||
6. [Free Energy and Variational Inference](#6-free-energy-and-variational-inference)
|
||||
7. [Energy-Based Models: Physical Interpretation](#7-energy-based-models-physical-interpretation)
|
||||
8. [Thermodynamic Bounds on Computation](#8-thermodynamic-bounds-on-computation)
|
||||
|
||||
---
|
||||
|
||||
## 1. Statistical Mechanics Primer
|
||||
|
||||
### 1.1 Microcanonical Ensemble
|
||||
|
||||
For an isolated system with energy E:
|
||||
```
|
||||
Ω(E) = number of microstates with energy E
|
||||
S = k ln Ω(E) (Boltzmann entropy)
|
||||
```
|
||||
|
||||
**Physical Meaning**: Entropy measures the logarithm of accessible microstates.
|
||||
|
||||
### 1.2 Canonical Ensemble
|
||||
|
||||
For a system in thermal contact with reservoir at temperature T:
|
||||
```
|
||||
p(E_i) = (1/Z) exp(-E_i / kT)
|
||||
Z = Σ_i exp(-E_i / kT) (partition function)
|
||||
```
|
||||
|
||||
**Thermodynamic quantities**:
|
||||
```
|
||||
Free Energy: F = -kT ln Z = ⟨E⟩ - TS
|
||||
Entropy: S = -k Σ_i p_i ln p_i = -k⟨ln p⟩
|
||||
Average E: ⟨E⟩ = Σ_i p_i E_i
|
||||
Heat Capacity: C = d⟨E⟩/dT
|
||||
```
|
||||
|
||||
### 1.3 Boltzmann Distribution
|
||||
|
||||
The probability of state with energy E at temperature T:
|
||||
```
|
||||
p(E) ∝ exp(-E/kT) = exp(-βE)
|
||||
```
|
||||
|
||||
where β = 1/(kT) is the **inverse temperature** (coldness).
|
||||
|
||||
**Key Insight**: Physical systems naturally sample from probability distributions weighted by exp(-energy).
|
||||
|
||||
### 1.4 Fluctuation-Dissipation Theorem
|
||||
|
||||
Thermal fluctuations and dissipation are related:
|
||||
```
|
||||
⟨δx(t) δx(0)⟩ = (kT/γ) exp(-γt/m)
|
||||
```
|
||||
|
||||
**Implication**: Cannot have low-noise system without dissipation. Thermal noise is fundamental at temperature T.
|
||||
|
||||
---
|
||||
|
||||
## 2. Information Theory and Physics
|
||||
|
||||
### 2.1 Shannon Entropy
|
||||
|
||||
For discrete probability distribution p(x):
|
||||
```
|
||||
H[p] = -Σ_x p(x) log₂ p(x) (bits)
|
||||
= -k Σ_x p(x) ln p(x) (thermodynamic units)
|
||||
```
|
||||
|
||||
**Connection to Thermodynamics**: Shannon entropy has same mathematical form as Boltzmann/Gibbs entropy.
|
||||
|
||||
### 2.2 Mutual Information
|
||||
|
||||
Information shared between variables X and Y:
|
||||
```
|
||||
I(X; Y) = H[X] + H[Y] - H[X,Y]
|
||||
= Σ p(x,y) log[p(x,y) / (p(x)p(y))]
|
||||
```
|
||||
|
||||
**Physical Meaning**: Mutual information quantifies correlations—how much knowing X tells you about Y.
|
||||
|
||||
### 2.3 Kullback-Leibler Divergence
|
||||
|
||||
"Distance" from distribution q to distribution p:
|
||||
```
|
||||
D_KL[q || p] = Σ q(x) log[q(x)/p(x)]
|
||||
= ⟨log q - log p⟩_q
|
||||
```
|
||||
|
||||
**Properties**:
|
||||
- Always non-negative: D_KL ≥ 0
|
||||
- Zero iff q = p almost everywhere
|
||||
- Not symmetric: D_KL[q||p] ≠ D_KL[p||q]
|
||||
|
||||
**Physical Interpretation**: Excess entropy when using wrong distribution.
|
||||
|
||||
### 2.4 Relative Entropy and Free Energy
|
||||
|
||||
For canonical ensemble:
|
||||
```
|
||||
D_KL[q || p_β] = Σ q(x) log[q(x)] - Σ q(x) log[exp(-βE(x))/Z]
|
||||
= -S[q]/k + β⟨E⟩_q + log Z
|
||||
= β(F[q] - F[p])
|
||||
```
|
||||
|
||||
**Key Insight**: KL divergence to Boltzmann distribution = free energy difference (in units of kT).
|
||||
|
||||
---
|
||||
|
||||
## 3. Landauer's Principle: Detailed Derivation
|
||||
|
||||
### 3.1 Setup: Bit Erasure
|
||||
|
||||
Consider a 1-bit memory:
|
||||
- **Initial state**: Unknown (0 or 1 with probabilities p₀, p₁)
|
||||
- **Final state**: Known (forced to 0)
|
||||
|
||||
### 3.2 Information-Theoretic Analysis
|
||||
|
||||
Initial entropy:
|
||||
```
|
||||
S_initial = -k[p₀ ln p₀ + p₁ ln p₁]
|
||||
```
|
||||
|
||||
Final entropy:
|
||||
```
|
||||
S_final = 0 (definite state)
|
||||
```
|
||||
|
||||
Change in information:
|
||||
```
|
||||
ΔI = S_initial - S_final = -k[p₀ ln p₀ + p₁ ln p₁]
|
||||
```
|
||||
|
||||
For maximum erasure (p₀ = p₁ = 1/2):
|
||||
```
|
||||
ΔI = k ln 2
|
||||
```
|
||||
|
||||
### 3.3 Thermodynamic Analysis
|
||||
|
||||
**Second Law**: Total entropy (system + environment) cannot decrease:
|
||||
```
|
||||
ΔS_total = ΔS_system + ΔS_environment ≥ 0
|
||||
```
|
||||
|
||||
For isothermal process:
|
||||
```
|
||||
ΔS_environment = Q/T
|
||||
```
|
||||
|
||||
where Q is heat dissipated to environment.
|
||||
|
||||
**Combining**:
|
||||
```
|
||||
ΔS_system + Q/T ≥ 0
|
||||
-k ln 2 + Q/T ≥ 0
|
||||
Q ≥ kT ln 2
|
||||
```
|
||||
|
||||
**Landauer's Principle**: Erasing 1 bit of information requires dissipating at least kT ln 2 of heat.
|
||||
|
||||
### 3.4 Physical Implementation: Szilard Engine
|
||||
|
||||
**1-Molecule Gas Engine**:
|
||||
1. Single molecule in box (unknown side)
|
||||
2. Insert partition (0 information about position)
|
||||
3. Measure which side (gain 1 bit)
|
||||
4. Attach piston to occupied side
|
||||
5. Extract work kT ln 2 via isothermal expansion
|
||||
6. Remove partition
|
||||
7. **Erase measurement record** → Dissipate kT ln 2
|
||||
|
||||
**Cycle**: Extract work using information, pay thermodynamic cost to erase memory.
|
||||
|
||||
### 3.5 Generalization: Arbitrary Distribution
|
||||
|
||||
For erasing memory in state with probability distribution p(x):
|
||||
```
|
||||
Q ≥ kT × H[p] = -kT Σ p(x) ln p(x)
|
||||
```
|
||||
|
||||
**More uncertain initial state → More heat dissipated.**
|
||||
|
||||
---
|
||||
|
||||
## 4. Non-Equilibrium Thermodynamics
|
||||
|
||||
### 4.1 Entropy Production
|
||||
|
||||
For a system driven out of equilibrium:
|
||||
```
|
||||
dS/dt = d_iS/dt + d_eS/dt
|
||||
```
|
||||
|
||||
- d_iS/dt = internal entropy production (≥ 0)
|
||||
- d_eS/dt = entropy flow from environment (can be negative)
|
||||
|
||||
**Second Law**: d_iS/dt ≥ 0 always.
|
||||
|
||||
### 4.2 Jarzynski Equality
|
||||
|
||||
For a system driven from equilibrium at λ=0 to λ=1:
|
||||
```
|
||||
⟨exp(-βW)⟩ = exp(-βΔF)
|
||||
```
|
||||
|
||||
Where:
|
||||
- W = work performed on system
|
||||
- ΔF = free energy difference
|
||||
- ⟨⟩ = average over many realizations
|
||||
|
||||
**Implication**: Can extract equilibrium free energy from non-equilibrium processes.
|
||||
|
||||
### 4.3 Crooks Fluctuation Theorem
|
||||
|
||||
Ratio of forward to reverse process probabilities:
|
||||
```
|
||||
P(W_forward) / P(-W_reverse) = exp(β(W - ΔF))
|
||||
```
|
||||
|
||||
**Special case (Jarzynski)**: Integrate over W.
|
||||
|
||||
### 4.4 Entropy Production Rate
|
||||
|
||||
For driven system:
|
||||
```
|
||||
Σ̇ = (1/T) Σ_i J_i X_i ≥ 0
|
||||
```
|
||||
|
||||
Where:
|
||||
- J_i = thermodynamic flux (current)
|
||||
- X_i = thermodynamic force (gradient)
|
||||
|
||||
**Examples**:
|
||||
- Heat flux: J = heat current, X = ∇(1/T)
|
||||
- Particle flux: J = particle current, X = -∇μ
|
||||
- Chemical reactions: J = reaction rate, X = -ΔG/T
|
||||
|
||||
---
|
||||
|
||||
## 5. Stochastic Thermodynamics
|
||||
|
||||
### 5.1 Langevin Equation
|
||||
|
||||
For a particle in potential V(x) with friction γ and thermal noise:
|
||||
```
|
||||
m(d²x/dt²) = -γ(dx/dt) - dV/dx + ξ(t)
|
||||
```
|
||||
|
||||
Where noise satisfies:
|
||||
```
|
||||
⟨ξ(t)⟩ = 0
|
||||
⟨ξ(t)ξ(t')⟩ = 2γkT δ(t-t') (fluctuation-dissipation)
|
||||
```
|
||||
|
||||
**Overdamped limit** (low inertia):
|
||||
```
|
||||
γ(dx/dt) = -dV/dx + ξ(t)
|
||||
dx/dt = -(1/γ)dV/dx + √(2D) η(t)
|
||||
```
|
||||
|
||||
where D = kT/γ (Einstein relation).
|
||||
|
||||
### 5.2 Fokker-Planck Equation
|
||||
|
||||
Evolution of probability distribution p(x,t):
|
||||
```
|
||||
∂p/∂t = -∂/∂x[v(x)p] + D ∂²p/∂x²
|
||||
```
|
||||
|
||||
- First term: deterministic drift
|
||||
- Second term: diffusion
|
||||
|
||||
**Steady state**: ∂p/∂t = 0 gives Boltzmann distribution.
|
||||
|
||||
### 5.3 Stochastic Entropy Production
|
||||
|
||||
Along a single trajectory:
|
||||
```
|
||||
Δs_tot = Δs_system + Δs_environment
|
||||
= ln[p(x_initial)/p(x_final)] + βQ
|
||||
```
|
||||
|
||||
**Average**: ⟨Δs_tot⟩ ≥ 0 (second law)
|
||||
|
||||
### 5.4 Information-Theoretic Formulation
|
||||
|
||||
For feedback control (Maxwell's demon):
|
||||
```
|
||||
⟨Δs_tot⟩ = ⟨Δs_system⟩ + ⟨Δs_environment⟩ - I
|
||||
≥ 0
|
||||
```
|
||||
|
||||
Where I = mutual information between system and controller.
|
||||
|
||||
**Sagawa-Ueda Generalized Second Law**:
|
||||
```
|
||||
⟨W⟩ ≥ ΔF - kT × I
|
||||
```
|
||||
|
||||
Can extract up to kT×I extra work using information.
|
||||
|
||||
---
|
||||
|
||||
## 6. Free Energy and Variational Inference
|
||||
|
||||
### 6.1 Helmholtz Free Energy
|
||||
|
||||
For system at temperature T:
|
||||
```
|
||||
F = ⟨E⟩ - TS = U - TS
|
||||
```
|
||||
|
||||
**Equilibrium condition**: F is minimized.
|
||||
|
||||
**Physical meaning**:
|
||||
- U = ⟨E⟩ = average energy (favors low energy states)
|
||||
- -TS = entropy contribution (favors high entropy)
|
||||
- F balances energy and entropy
|
||||
|
||||
### 6.2 Variational Free Energy (Friston)
|
||||
|
||||
For generative model p(x,s) and observations s:
|
||||
```
|
||||
F[q] = E_q[E(x,s)] - H[q(x|s)]
|
||||
= -E_q[log p(x,s)] + E_q[log q(x|s)]
|
||||
= -log p(s) + D_KL[q(x|s) || p(x|s)]
|
||||
```
|
||||
|
||||
Where:
|
||||
- x = hidden states
|
||||
- s = sensory observations
|
||||
- q(x|s) = approximate posterior (beliefs)
|
||||
- p(x|s) = true posterior
|
||||
|
||||
**Key Properties**:
|
||||
1. F ≥ -log p(s) with equality when q = p
|
||||
2. Minimizing F ⟺ maximizing evidence p(s)
|
||||
3. F decomposes into energy and entropy
|
||||
|
||||
### 6.3 Free Energy Principle
|
||||
|
||||
**Biological systems minimize variational free energy:**
|
||||
```
|
||||
dF/dt ≤ 0
|
||||
```
|
||||
|
||||
**Mechanisms**:
|
||||
1. **Perception**: Update beliefs q to minimize F (∂F/∂q)
|
||||
2. **Action**: Change sensory input s to minimize F (∂F/∂s)
|
||||
|
||||
**Connection to Thermodynamics**:
|
||||
- Variational free energy ↔ Helmholtz free energy
|
||||
- Minimizing surprise ↔ Resisting disorder
|
||||
- Living systems are non-equilibrium steady states
|
||||
|
||||
### 6.4 Active Inference
|
||||
|
||||
Expected free energy for policy π:
|
||||
```
|
||||
G[π] = E_π[F[q]] + D_KL[q(s|π) || p(s)]
|
||||
```
|
||||
|
||||
**Decomposition**:
|
||||
```
|
||||
G = Pragmatic value + Epistemic value
|
||||
= E_π[log q(s)] - E_π[log p(s|x)] (ambiguity)
|
||||
+ E_π[H[p(x|s)]] (risk)
|
||||
```
|
||||
|
||||
**Interpretation**:
|
||||
- Pragmatic: Achieve preferred outcomes
|
||||
- Epistemic: Resolve uncertainty about world
|
||||
|
||||
---
|
||||
|
||||
## 7. Energy-Based Models: Physical Interpretation
|
||||
|
||||
### 7.1 Boltzmann Machines
|
||||
|
||||
Probability distribution over binary variables s_i ∈ {0,1}:
|
||||
```
|
||||
p(s) = (1/Z) exp(-E(s)/T)
|
||||
```
|
||||
|
||||
Energy function:
|
||||
```
|
||||
E(s) = -Σ_ij W_ij s_i s_j - Σ_i b_i s_i
|
||||
```
|
||||
|
||||
**Physical interpretation**:
|
||||
- W_ij = coupling strength (interaction energy)
|
||||
- b_i = external field (bias)
|
||||
- T = temperature (controls randomness)
|
||||
|
||||
### 7.2 Hopfield Networks
|
||||
|
||||
Symmetric weights, energy function:
|
||||
```
|
||||
E = -(1/2) Σ_ij W_ij s_i s_j - Σ_i b_i s_i
|
||||
```
|
||||
|
||||
**Dynamics** (asynchronous update):
|
||||
```
|
||||
s_i(t+1) = sign(Σ_j W_ij s_j(t) + b_i)
|
||||
```
|
||||
|
||||
**Energy decreases** (or stays constant) with each update:
|
||||
```
|
||||
ΔE = E(t+1) - E(t) ≤ 0
|
||||
```
|
||||
|
||||
**Attractor dynamics**: System settles to local energy minima (memories).
|
||||
|
||||
### 7.3 Equilibrium Propagation
|
||||
|
||||
**Free phase**:
|
||||
```
|
||||
τ ds/dt = -∂E(s,y)/∂s
|
||||
```
|
||||
|
||||
Settles to equilibrium s* where ∂E/∂s = 0.
|
||||
|
||||
**Nudged phase**:
|
||||
```
|
||||
τ ds/dt = -∂E(s,y)/∂s - β(y - y_target)
|
||||
```
|
||||
|
||||
Gently pushes toward target.
|
||||
|
||||
**Learning rule**:
|
||||
```
|
||||
dW/dt ∝ ⟨s_i s_j⟩_nudged - ⟨s_i s_j⟩_free
|
||||
```
|
||||
|
||||
**Physical interpretation**:
|
||||
- Free phase: Thermodynamic equilibration
|
||||
- Nudged phase: Perturbed equilibrium
|
||||
- Learning: Adjust weights to make nudge smaller
|
||||
|
||||
### 7.4 Connection to Contrastive Divergence
|
||||
|
||||
Gradient of log-likelihood for Boltzmann machine:
|
||||
```
|
||||
∂log p(s_data)/∂W_ij = ⟨s_i s_j⟩_data - ⟨s_i s_j⟩_model
|
||||
```
|
||||
|
||||
**Positive phase**: ⟨⟩_data from observations
|
||||
**Negative phase**: ⟨⟩_model from sampling equilibrium
|
||||
|
||||
**Equilibrium propagation** is continuous-time, deterministic version.
|
||||
|
||||
---
|
||||
|
||||
## 8. Thermodynamic Bounds on Computation
|
||||
|
||||
### 8.1 Landauer Bound
|
||||
|
||||
Already derived: Erasing n bits dissipates at least:
|
||||
```
|
||||
Q ≥ n × kT ln 2
|
||||
```
|
||||
|
||||
### 8.2 Margolus-Levitin Bound
|
||||
|
||||
Maximum speed of computation (orthogonal quantum states):
|
||||
```
|
||||
τ ≥ πℏ / (2E)
|
||||
```
|
||||
|
||||
Where E is energy of system.
|
||||
|
||||
**Interpretation**: Fundamental tradeoff between speed and energy. More energy → faster computation.
|
||||
|
||||
### 8.3 Bekenstein Bound
|
||||
|
||||
Maximum information in region of space:
|
||||
```
|
||||
I ≤ 2πRE / (ℏc ln 2)
|
||||
```
|
||||
|
||||
Where R is radius, E is energy.
|
||||
|
||||
**For spherical region**:
|
||||
```
|
||||
I ≤ (A/4) × (k ln 2 / (ℏG)) ≈ A/(4 L_P²)
|
||||
```
|
||||
|
||||
Where A is surface area, L_P is Planck length.
|
||||
|
||||
**Interpretation**: Holographic bound—information scales with area, not volume.
|
||||
|
||||
### 8.4 Lloyd's Bound
|
||||
|
||||
Ultimate speed of computation:
|
||||
```
|
||||
Operations/sec ≤ E / (πℏ) ≈ 10^51 × (E/1kg)
|
||||
```
|
||||
|
||||
**Example**: 1 kg of matter → 10^51 ops/sec maximum.
|
||||
|
||||
### 8.5 Synthesis: Multi-Dimensional Limits
|
||||
|
||||
Computation is bounded by:
|
||||
|
||||
| Resource | Bound | Limiting Constant |
|
||||
|----------|-------|-------------------|
|
||||
| Energy per bit erased | E ≥ kT ln 2 | Boltzmann constant k |
|
||||
| Speed vs. energy | τ ≥ πℏ/2E | Planck constant ℏ |
|
||||
| Information per energy | I ≤ E/(kT ln 2) | kT ln 2 |
|
||||
| Ops per energy | N ≤ E/(πℏ) | ℏ |
|
||||
| Info per volume | I ≤ A/(4L_P²) | Planck area |
|
||||
|
||||
**Key Insight**: All fundamental limits trace back to h, k, c, G—the fundamental constants of physics.
|
||||
|
||||
---
|
||||
|
||||
## 9. Thermodynamic Cost of Learning
|
||||
|
||||
### 9.1 Information-Theoretic View
|
||||
|
||||
**Learning**: Extracting model θ from data D.
|
||||
|
||||
**Information gained**:
|
||||
```
|
||||
I(D; θ) = H[θ] - H[θ|D]
|
||||
```
|
||||
|
||||
**Minimum thermodynamic cost**:
|
||||
```
|
||||
Q ≥ kT × I(D; θ)
|
||||
```
|
||||
|
||||
**Interpretation**: Must dissipate heat proportional to information extracted from data.
|
||||
|
||||
### 9.2 PAC Learning Bounds
|
||||
|
||||
Probably Approximately Correct (PAC) learning requires:
|
||||
```
|
||||
m ≥ (1/ε²) × [d log(1/ε) + log(1/δ)]
|
||||
```
|
||||
|
||||
samples, where d = VC dimension.
|
||||
|
||||
**Thermodynamic cost**:
|
||||
```
|
||||
Q ≥ kT × m × (log |X| + log |Y|)
|
||||
```
|
||||
|
||||
**Implication**: Harder learning problems (larger d, smaller ε) have higher energy cost.
|
||||
|
||||
### 9.3 Generalization and Thermodynamics
|
||||
|
||||
**Hypothesis**: Thermodynamic cost of learning is related to generalization gap.
|
||||
|
||||
**Intuition**:
|
||||
- Memorization: High mutual information I(D; θ)
|
||||
- Generalization: Low mutual information (compressed representation)
|
||||
|
||||
**Possible bound**:
|
||||
```
|
||||
Generalization gap ∝ I(D; θ) / |D|
|
||||
```
|
||||
|
||||
**Thermodynamic consequence**:
|
||||
- Overparameterized models: High I(D; θ) → High energy cost
|
||||
- Regularized models: Low I(D; θ) → Low energy cost
|
||||
|
||||
**Prediction**: Energy-efficient learning favors generalizable models.
|
||||
|
||||
---
|
||||
|
||||
## 10. Mathematical Toolbox
|
||||
|
||||
### 10.1 Useful Inequalities
|
||||
|
||||
**Jensen's Inequality**: For convex function f:
|
||||
```
|
||||
f(E[X]) ≤ E[f(X)]
|
||||
```
|
||||
|
||||
**Gibbs Inequality**: D_KL[p||q] ≥ 0
|
||||
|
||||
**Log-Sum Inequality**:
|
||||
```
|
||||
Σ a_i log(a_i/b_i) ≥ (Σ a_i) log[(Σ a_i)/(Σ b_i)]
|
||||
```
|
||||
|
||||
### 10.2 Variational Principles
|
||||
|
||||
**ELBO (Evidence Lower Bound)**:
|
||||
```
|
||||
log p(x) ≥ E_q[log p(x,z)] - E_q[log q(z)]
|
||||
= -F[q]
|
||||
```
|
||||
|
||||
**Variational inference**: Maximize ELBO ⟺ Minimize free energy.
|
||||
|
||||
### 10.3 Calculus of Variations
|
||||
|
||||
To minimize functional F[q]:
|
||||
```
|
||||
δF/δq = 0
|
||||
```
|
||||
|
||||
**Example**: Find q that minimizes F = E_q[E] - TS[q]:
|
||||
```
|
||||
q(x) = (1/Z) exp(-E(x)/T) (Boltzmann distribution)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Summary: Key Equations
|
||||
|
||||
### Fundamental Constants
|
||||
```
|
||||
k = 1.381 × 10⁻²³ J/K (Boltzmann)
|
||||
ℏ = 1.055 × 10⁻³⁴ J·s (Planck)
|
||||
c = 3 × 10⁸ m/s (Speed of light)
|
||||
```
|
||||
|
||||
### Thermodynamic Relations
|
||||
```
|
||||
F = U - TS (Helmholtz free energy)
|
||||
dF = -SdT - PdV (Fundamental relation)
|
||||
S = -k Σ p_i ln p_i (Entropy)
|
||||
p_i = (1/Z) exp(-E_i/kT) (Boltzmann distribution)
|
||||
```
|
||||
|
||||
### Information Theory
|
||||
```
|
||||
H[p] = -Σ p(x) log p(x) (Shannon entropy)
|
||||
I(X;Y) = H[X] - H[X|Y] (Mutual information)
|
||||
D_KL[q||p] = Σ q(x) log[q(x)/p(x)] (KL divergence)
|
||||
```
|
||||
|
||||
### Landauer and Computation
|
||||
```
|
||||
E_erase ≥ kT ln 2 (Landauer bound)
|
||||
τ_min ≥ πℏ/(2E) (Margolus-Levitin)
|
||||
I_max ≤ 2πRE/(ℏc ln 2) (Bekenstein)
|
||||
```
|
||||
|
||||
### Learning Bounds
|
||||
```
|
||||
E_learn ≥ kT × I(D; θ) (Information cost)
|
||||
F[q] = E_q[E] - TS (Variational free energy)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Further Reading
|
||||
|
||||
**Classical Thermodynamics**:
|
||||
- Callen, *Thermodynamics and an Introduction to Thermostatistics*
|
||||
- Chandler, *Introduction to Modern Statistical Mechanics*
|
||||
|
||||
**Information Theory**:
|
||||
- Cover & Thomas, *Elements of Information Theory*
|
||||
- MacKay, *Information Theory, Inference, and Learning Algorithms*
|
||||
|
||||
**Information Thermodynamics**:
|
||||
- Sagawa & Ueda, "Minimal energy cost for thermodynamic information processing"
|
||||
- Parrondo et al., "Thermodynamics of information," *Nature Physics* (2015)
|
||||
|
||||
**Free Energy Principle**:
|
||||
- Friston, "The free-energy principle: a unified brain theory?" (2010)
|
||||
- Parr, Pezzulo, Friston, *Active Inference: The Free Energy Principle in Mind, Brain, and Behavior* (MIT Press, 2022)
|
||||
|
||||
**Energy-Based Learning**:
|
||||
- Scellier & Bengio, "Equilibrium Propagation" (2017)
|
||||
- Hinton, "Training Products of Experts by Minimizing Contrastive Divergence" (2002)
|
||||
|
||||
---
|
||||
|
||||
**Status**: Comprehensive mathematical foundation for thermodynamic learning
|
||||
**Last Updated**: December 2025
|
||||
**Prerequisites**: Statistical mechanics, information theory, calculus
|
||||
**Next**: Apply these principles to implement Landauer-optimal learning systems
|
||||
Reference in New Issue
Block a user