feat(odoh): ship ODoH client + self-hosted relay (RFC 9230) #121

Merged
razvandimescu merged 5 commits from feat/odoh into main 2026-04-20 21:26:54 +08:00

5 Commits

Author SHA1 Message Date
Razvan Dimescu
eb5ea3b645 refactor(odoh): deduplicate post-audit findings
- Hoist ODOH_CONTENT_TYPE to a single pub(crate) constant in odoh.rs;
  relay.rs imports it instead of declaring its own.
- Generalize dashboard encryptionPct(data, encryptedKeys, allKeys)
  so both Inbound Wire and Outbound Wire panels share the same math
  instead of drifting independently.
- Extract RelayState::new() and build_app() helpers in relay.rs so
  the test spawn_relay() and production run() wire the same router
  + body-limit layer. Prevents future middleware from landing in one
  path but not the other.

All 344 lib tests pass; no behavior change.
2026-04-20 16:03:34 +03:00
Razvan Dimescu
be60f6ccbc chore(packaging): docker-compose + Caddyfile for ODoH relay deploy
Two-container deploy: Caddy terminates TLS (auto-provisions Let's
Encrypt via ACME) and reverse-proxies to a Numa relay on an internal
Docker network. The relay never reads sealed payloads; Caddy's
access log is discarded so per-request observability doesn't defeat
the oblivious property.

Validated against Hetzner CX22 + DNS at odoh-relay.numa.rs:
- TLS-ALPN-01 challenge succeeded on first attempt
- /health returned the relay's counter block
- End-to-end ODoH client → relay → Cloudflare works

Operators only need to: set a DNS A record, edit Caddyfile's hostname,
docker compose up -d. README walks through the steps and the DNSCrypt
v3/odoh-relays.md submission to claim a public listing.
2026-04-20 15:44:29 +03:00
Razvan Dimescu
a3cc64c94f feat(odoh): relay bind-address CLI arg + dashboard Outbound Wire panel
- `numa relay [PORT] [BIND]` accepts an optional bind address (defaults
  to 127.0.0.1, matching the Caddy reverse-proxy deployment shape).
  Required for Docker, where the relay needs 0.0.0.0 inside the
  container so Caddy can reach it across the bridge network.

- Dashboard now surfaces the upstream_transport dimension as an
  "Outbound Wire" panel alongside the existing "Inbound Wire" (renamed
  from "Transport" for directional clarity). Sub-headers — "apps → numa"
  / "numa → internet" — make the threat-model split obvious without
  jargon. Bars: UDP/DoH/DoT/ODoH, headline "X% encrypted outbound".
  The PR description's promise that "the dashboard answers how much of
  my DNS traffic left in cleartext honestly" is now true.
2026-04-20 15:44:20 +03:00
Razvan Dimescu
cf128c19af feat(odoh): bootstrap-IP overrides + zero hedge for ODoH (post-deploy fixes)
Two issues surfaced from running mode = "odoh" against the live Hetzner
relay as system DNS:

1. **Bootstrap deadlock.** The reqwest HTTPS client resolves the relay
   and target hostnames via system DNS. When numa is itself the system
   resolver, the ODoH client loops trying to resolve through itself.
   Adds optional `relay_ip` and `target_ip` to `[upstream]`, plumbed
   into reqwest's `resolve()` so the HTTPS client bypasses system DNS
   for those two hostnames. TLS still validates against the URL
   hostname, so a stale IP fails loudly rather than silently MITM'ing.

2. **2x relay load.** Default `hedge_ms = 10` triggers a duplicate
   in-flight query for every request. Useful for UDP/DoH/DoT (rescues
   tail latency cheaply); wasteful for ODoH (doubles HPKE seal/unseal,
   doubles sealed-byte footprint a passive observer can correlate, no
   latency win — relay hop dominates either way). Force-zero in
   oblivious mode regardless of configured hedge_ms.

Validated end-to-end against odoh-relay.numa.rs → Cloudflare:
3 digs produced 3 forwarded_ok on the relay (was 6 before the hedge
fix), upstream_transport.odoh ticks correctly.
2026-04-20 15:44:09 +03:00
Razvan Dimescu
241c40553b feat(odoh): ship ODoH client + self-hosted relay (RFC 9230)
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.

Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.

Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.

Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
2026-04-20 12:34:04 +03:00