Browsers heuristically cached the dashboard page because the response
carried no Cache-Control header, so a numa upgrade on the daemon did
not surface updated PATH_DEFS (e.g. the UPSTREAM row added in v0.14.0)
until the user hard-reloaded. Force revalidation on every load.
Closes#144.
docs/ is gitignored; references to docs/implementation/*.md from public
source, configs, and packaging were dead links outside the maintainer
machine. Adds four recipes (README, dnsdist-front, doh-on-lan,
odoh-upstream) under top-level recipes/ and repoints existing pointers.
- numa.toml, packaging/client/{README.md,numa.toml}: point to
recipes/odoh-upstream.md.
- src/{bootstrap_resolver,forward,serve}.rs: reference issue #122
directly (module scope is broader than the ODoH-specific recipe).
- src/health.rs: drop the §-ref; iOS HealthInfo remains named as the
canonical consumer.
The example in `numa.toml` pointed at `https://odoh-relay.numa.rs/proxy`,
but the relay only serves the ODoH endpoint at `/relay` (every other
reference in the tree — `src/config.rs` docs and tests, and
`packaging/client/numa.toml` — uses `/relay`). Users who copied the
example got `404 Not Found` on every query and SERVFAIL at the client.
Reported in #138.
SOA records were stored as opaque bytes (DnsRecord::UNKNOWN), so the
RFC 1035 §3.3.13 MNAME/RNAME name-compression pointers — offsets into
the upstream packet — were re-emitted verbatim. Once Numa applied its
own compression to surrounding names, those pointers landed on garbage
and clients rejected the reply ("malformed reply packet" in kdig).
Parse SOA via read_qname and write via write_qname, matching the
NS/CNAME/MX pattern. Adds the canonical-rdata arm in dnssec.rs for
RRSIG verification. Regression test round-trips a CNAME-chain response
with a compressed SOA in authority through hickory-proto strict parse.
Hedging fires a second upstream query against the same upstream after
the hedge delay. Rescues packet loss and handshake stalls on flaky
links, but every lookup shows up twice at the provider — silently
halves the headroom for anyone on a quota'd upstream (NextDNS free tier,
Control D, paid Quad9).
Surfaced by #134 (bcookatpcsd), who saw every query duplicated on the
NextDNS dashboard with a single-address DoT upstream. Not a bug — the
feature doing what it says on the tin — but a surprising default.
Flipping the default to 0 makes hedging explicitly opt-in. Users who
want tail-latency rescue on flaky nets add `hedge_ms = 10` (or higher).
No config migration needed; no breaking changes to the API surface.
Also tightens the numa.toml comment so the trade-off is visible at
config time, not retroactively on a provider dashboard.
Single-container docker-compose recipe for running numa in ODoH client
mode. Ships with a starter numa.toml pointing at odoh-relay.numa.rs
paired with Cloudflare's ODoH target — two independent operators with
distinct eTLD+1s, so the default passes numa's same-operator check.
Exposes :53 UDP+TCP for LAN clients and :5380 for the dashboard + REST
API. README covers prerequisites, deploy, verification, and the ODoH
privacy boundary (relay sees IP, target sees query, neither sees both).
Advertised alongside packaging/relay/ in the main README Docker section.
Complements the bootstrap resolver fix (#122, #126) by documenting the
ODoH knobs in the commented config template. Explains relay_ip/target_ip
as the way to prevent plain-DNS leaks of the relay/target hostnames via
the bootstrap resolver on cold boot when numa is its own system DNS.
Advisory published 2026-04-22: reachable panic in certificate revocation
list parsing. Patch is a lockfile-only bump — transitive via rustls, no
direct dep changes. Unblocks cargo audit in CI across all open PRs.
Replaces the plain python3 http.server + one-shot make blog with a
watcher pipeline: chokidar regenerates HTML on MD/template changes,
browser-sync serves the site and reloads the browser on rendered-asset
changes. First run downloads both via npx; subsequent runs are instant.
Preflight checks for npx and pandoc. Port arg parsing is tolerant of
legacy --drafts flag ordering (drafts are always included now, since
that's what the dev loop actually wants).
Cleanup trap kills the watcher on exit so re-runs don't leave orphans.
- Switch overrides from HashMap to BTreeMap — deterministic iteration by
type, drops the manual sort when logging.
- Rename the flat_map closure's inner `ips` to `addrs` to stop shadowing
the outer Vec<String>.
- Trim the Suite 8 TEST-NET-1 comment to keep the "why" and drop
mechanism narration.
- Drop a redundant sleep 1 after wait — wait already blocks on exit.
Suite 8 now ends with a config using RFC 5737 TEST-NET-1 IPs as
relay_ip/target_ip, started briefly so the bootstrap resolver logs its
override map. Asserts both host=IP pairs land in that map — closing the
gap flagged on PR #126 (zero-plain-DNS-leak for ODoH endpoints was only
unit-tested).
Also: NumaResolver::new now logs the override map at INFO when non-empty,
so operators can verify their ODoH bootstrap without needing DEBUG level.
When numa is its own system DNS resolver (HAOS add-on, Pi-hole-style
container, /etc/resolv.conf → 127.0.0.1), every numa-originated HTTPS
connection — DoH upstream, ODoH relay/target, blocklist CDN — routed
its hostname through getaddrinfo() back to numa itself. Cold boot
deadlocked; steady state taxed every new TCP connection. 0.14.1's
retry-with-backoff masked the startup race but not the underlying
self-loop.
NumaResolver implements reqwest::dns::Resolve with two lanes:
- Per-host overrides (ODoH relay_ip/target_ip) short-circuit DNS
entirely, preserving ODoH's zero-plain-DNS-leak property.
- Otherwise: A+AAAA in parallel via UDP to IP-literal bootstrap
servers, with TCP fallback for UDP-hostile networks.
Bootstrap IPs come from upstream.fallback (IP-literal filtered,
hostnames skipped with a warning). Empty fallback yields the
hardcoded default [9.9.9.9, 1.1.1.1]; the chosen source is logged
at startup so the silent default is visible.
doh_keepalive_loop now fires its first tick immediately, and
keepalive_doh logs failures at WARN — bootstrap issues surface
within ~100ms of boot instead of on the first client query.
Distinct from UpstreamPool.fallback (client-query failover) which
stays untouched: client queries with no configured fallback still
SERVFAIL on primary failure rather than silently shadow-routing.
Reproducer: tests/docker/self-resolver-loop.sh. Before: 0 blocklist
domains, 3072ms SERVFAIL. After: 397k domains, 118ms NOERROR.
resolve_coalesced now takes leader_path: QueryPath and applies to all
three upstream branches (Forwarded-rule, Recursive, Upstream), not just
Recursive. Fixes thundering-herd at boot when N concurrent HTTPS setups
each trigger independent forward queries for the same upstream hostname.
Derive both the flaky-server drop count and the zero-delay schedule
from RETRY_DELAYS_SECS.len() so the tests keep exercising their
intended invariants — "succeeds on final attempt" and "gives up after
all attempts fail" — if the production retry schedule ever changes.
Also: rename fail_first → drop_first_n to match drop(sock); swap the
giveup test's empty body for an "unreachable" sentinel so a regression
that accidentally served couldn't silently match Some("").
On cold start, reqwest's getaddrinfo can race numa's own first-query
cold-path latency — resolver timeout fires before numa warms its
upstream DoH connection. Wrap each blocklist fetch in 3 retries with
2s/10s/30s backoff; by the second attempt, the upstream is warm and
subsequent getaddrinfos succeed in <100ms.
Also: parallelize fetches across lists via join_all (different hosts,
no warming dependency), walk the full error source chain so reqwest
failures surface the underlying cause, and parameterize retry delays
for unit-test speed.
Commit eb5ea3b generalised encryptionPct from (transport) to
(data, encryptedKeys, allKeys) and updated renderTransport and
renderUpstreamWire, but missed the call inside render() that computes
the inline `~N/s · M% enc` QPS tag. With undefined allKeys, the
first .reduce() threw TypeError and the render try/catch silently
downgraded the whole dashboard to "disconnected" — every panel left
empty even though /stats was returning real data.
Fix the call site to match the other two (inbound-wire keys) and have
the catch log to console so the next silent-failure regression shows
up in DevTools within seconds instead of a source dive.
Plain host-string equality caught the copy-paste-same-URL footgun but
let `r.cloudflare.com` + `odoh.cloudflare.com` through — two subdomains
of the same operator collapse ODoH to ordinary DoH. Add a second layer:
compare registrable domains via the PSL (`psl` crate) after the exact-
host check. Fails open on IP literals and unparseable hosts; the exact-
host check still runs in those cases.
Adds the v0.14.0 capability where it's most differentiating: the first
paragraph (sealed-query framing alongside the existing ad-blocking and
.numa-domain pitches) and the second paragraph (numa relay as a public
ODoH endpoint, with the DNSCrypt-list supply-doubling angle as fact).
No reposition: tagline and structure unchanged. ODoH joins the
existing capability set rather than displacing it. Hero GIF stays;
will be re-recorded once the dashboard's Outbound Wire panel is worth
showing in motion.
Headline: ODoH (RFC 9230) — client + self-hosted relay. Set
mode = "odoh" in [upstream] to seal queries before they leave the
machine; run `numa relay` to add to the public ODoH ecosystem.
- Hoist ODOH_CONTENT_TYPE to a single pub(crate) constant in odoh.rs;
relay.rs imports it instead of declaring its own.
- Generalize dashboard encryptionPct(data, encryptedKeys, allKeys)
so both Inbound Wire and Outbound Wire panels share the same math
instead of drifting independently.
- Extract RelayState::new() and build_app() helpers in relay.rs so
the test spawn_relay() and production run() wire the same router
+ body-limit layer. Prevents future middleware from landing in one
path but not the other.
All 344 lib tests pass; no behavior change.
Two-container deploy: Caddy terminates TLS (auto-provisions Let's
Encrypt via ACME) and reverse-proxies to a Numa relay on an internal
Docker network. The relay never reads sealed payloads; Caddy's
access log is discarded so per-request observability doesn't defeat
the oblivious property.
Validated against Hetzner CX22 + DNS at odoh-relay.numa.rs:
- TLS-ALPN-01 challenge succeeded on first attempt
- /health returned the relay's counter block
- End-to-end ODoH client → relay → Cloudflare works
Operators only need to: set a DNS A record, edit Caddyfile's hostname,
docker compose up -d. README walks through the steps and the DNSCrypt
v3/odoh-relays.md submission to claim a public listing.
- `numa relay [PORT] [BIND]` accepts an optional bind address (defaults
to 127.0.0.1, matching the Caddy reverse-proxy deployment shape).
Required for Docker, where the relay needs 0.0.0.0 inside the
container so Caddy can reach it across the bridge network.
- Dashboard now surfaces the upstream_transport dimension as an
"Outbound Wire" panel alongside the existing "Inbound Wire" (renamed
from "Transport" for directional clarity). Sub-headers — "apps → numa"
/ "numa → internet" — make the threat-model split obvious without
jargon. Bars: UDP/DoH/DoT/ODoH, headline "X% encrypted outbound".
The PR description's promise that "the dashboard answers how much of
my DNS traffic left in cleartext honestly" is now true.
Two issues surfaced from running mode = "odoh" against the live Hetzner
relay as system DNS:
1. **Bootstrap deadlock.** The reqwest HTTPS client resolves the relay
and target hostnames via system DNS. When numa is itself the system
resolver, the ODoH client loops trying to resolve through itself.
Adds optional `relay_ip` and `target_ip` to `[upstream]`, plumbed
into reqwest's `resolve()` so the HTTPS client bypasses system DNS
for those two hostnames. TLS still validates against the URL
hostname, so a stale IP fails loudly rather than silently MITM'ing.
2. **2x relay load.** Default `hedge_ms = 10` triggers a duplicate
in-flight query for every request. Useful for UDP/DoH/DoT (rescues
tail latency cheaply); wasteful for ODoH (doubles HPKE seal/unseal,
doubles sealed-byte footprint a passive observer can correlate, no
latency win — relay hop dominates either way). Force-zero in
oblivious mode regardless of configured hedge_ms.
Validated end-to-end against odoh-relay.numa.rs → Cloudflare:
3 digs produced 3 forwarded_ok on the relay (was 6 before the hedge
fix), upstream_transport.odoh ticks correctly.
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.
Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.
Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.
Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
Adding a record type used to require 5 edits across the file (enum
variant, to_num, from_num, as_str, parse_str). The macro takes a
single (variant, num, str) tuple per type and generates the enum
plus all four methods.
UNKNOWN(u16) stays hand-coded since it carries data and can't sit
in the table.
src/question.rs: 156 lines -> 92 lines, no behavior change.
Logs were printing UNKNOWN(64), UNKNOWN(29), UNKNOWN(35) for SVCB,
LOC, and NAPTR — three RR types that have been registered for years
and show up in the wild (notably SVCB via RFC 9462 DDR clients
querying _dns.resolver.arpa).
Adds the variants and replaces the SVCB_QTYPE u16 const introduced
in #119 with QueryType::SVCB.to_num(), matching the HTTPS path.
Closes#114.