Commit Graph

362 Commits

Author SHA1 Message Date
Razvan Dimescu
93f0ea7501 Merge pull request #145 from razvandimescu/docs/recipes
docs: lift user-facing guides to recipes/, drop dangling docs/ refs
2026-04-24 15:22:44 +03:00
Razvan Dimescu
f7f35b3424 docs: lift user-facing guides to recipes/, drop dangling docs/ refs
docs/ is gitignored; references to docs/implementation/*.md from public
source, configs, and packaging were dead links outside the maintainer
machine. Adds four recipes (README, dnsdist-front, doh-on-lan,
odoh-upstream) under top-level recipes/ and repoints existing pointers.

- numa.toml, packaging/client/{README.md,numa.toml}: point to
  recipes/odoh-upstream.md.
- src/{bootstrap_resolver,forward,serve}.rs: reference issue #122
  directly (module scope is broader than the ODoH-specific recipe).
- src/health.rs: drop the §-ref; iOS HealthInfo remains named as the
  canonical consumer.
2026-04-24 15:09:16 +03:00
Razvan Dimescu
3913d42319 Merge pull request #137 from razvandimescu/fix/soa-compression-roundtrip
fix(packet): parse SOA natively to stop malformed replies (#128)
2026-04-24 13:59:57 +03:00
Razvan Dimescu
e702f5861b Update README.md to remove outdated listing information
Removed section about listing on the public ecosystem and DNSCrypt's canonical list.
2026-04-23 09:39:34 +03:00
Razvan Dimescu
933643f2c7 Merge pull request #139 from razvandimescu/fix/odoh-relay-doc-path
docs(config): fix ODoH relay path in numa.toml example
2026-04-23 08:58:53 +03:00
Razvan Dimescu
96cf778bea docs(config): fix ODoH relay path in numa.toml example
The example in `numa.toml` pointed at `https://odoh-relay.numa.rs/proxy`,
but the relay only serves the ODoH endpoint at `/relay` (every other
reference in the tree — `src/config.rs` docs and tests, and
`packaging/client/numa.toml` — uses `/relay`). Users who copied the
example got `404 Not Found` on every query and SERVFAIL at the client.

Reported in #138.
2026-04-23 08:53:35 +03:00
Razvan Dimescu
2274151c17 fix(packet): parse SOA natively to stop malformed replies (#128)
SOA records were stored as opaque bytes (DnsRecord::UNKNOWN), so the
RFC 1035 §3.3.13 MNAME/RNAME name-compression pointers — offsets into
the upstream packet — were re-emitted verbatim. Once Numa applied its
own compression to surrounding names, those pointers landed on garbage
and clients rejected the reply ("malformed reply packet" in kdig).

Parse SOA via read_qname and write via write_qname, matching the
NS/CNAME/MX pattern. Adds the canonical-rdata arm in dnssec.rs for
RRSIG verification. Regression test round-trips a CNAME-chain response
with a compressed SOA in authority through hickory-proto strict parse.
2026-04-23 00:36:02 +03:00
Razvan Dimescu
c787de1548 chore: bump version to 0.14.2 v0.14.2 2026-04-22 23:57:37 +03:00
Razvan Dimescu
e6e79273b9 Revert "chore: bump version to 0.15.0"
This reverts commit 3ec3b40830.
2026-04-22 23:57:28 +03:00
Razvan Dimescu
3ec3b40830 chore: bump version to 0.15.0 2026-04-22 23:50:20 +03:00
Razvan Dimescu
90fa79bc0f Merge pull request #135 from razvandimescu/fix/hedge-default-off
fix(upstream): default hedge_ms=0 to avoid silent 2x upstream query count
2026-04-22 23:49:15 +03:00
Razvan Dimescu
b8a125b598 fix(upstream): default hedge_ms=0 to avoid silent 2x upstream query count
Hedging fires a second upstream query against the same upstream after
the hedge delay. Rescues packet loss and handshake stalls on flaky
links, but every lookup shows up twice at the provider — silently
halves the headroom for anyone on a quota'd upstream (NextDNS free tier,
Control D, paid Quad9).

Surfaced by #134 (bcookatpcsd), who saw every query duplicated on the
NextDNS dashboard with a single-address DoT upstream. Not a bug — the
feature doing what it says on the tin — but a surprising default.

Flipping the default to 0 makes hedging explicitly opt-in. Users who
want tail-latency rescue on flaky nets add `hedge_ms = 10` (or higher).
No config migration needed; no breaking changes to the API surface.

Also tightens the numa.toml comment so the trade-off is visible at
config time, not retroactively on a provider dashboard.
2026-04-22 23:30:55 +03:00
Razvan Dimescu
bc30be94e7 Merge pull request #131 from razvandimescu/feat/packaging-client-docker
feat(packaging): ODoH client Docker deploy recipe
2026-04-22 23:11:50 +03:00
Razvan Dimescu
26b1cd5917 feat(packaging): ODoH client Docker deploy
Single-container docker-compose recipe for running numa in ODoH client
mode. Ships with a starter numa.toml pointing at odoh-relay.numa.rs
paired with Cloudflare's ODoH target — two independent operators with
distinct eTLD+1s, so the default passes numa's same-operator check.

Exposes :53 UDP+TCP for LAN clients and :5380 for the dashboard + REST
API. README covers prerequisites, deploy, verification, and the ODoH
privacy boundary (relay sees IP, target sees query, neither sees both).

Advertised alongside packaging/relay/ in the main README Docker section.
2026-04-22 18:05:46 +03:00
Razvan Dimescu
77d6d89f80 Merge pull request #130 from razvandimescu/docs/numa-toml-odoh-examples
docs(config): ODoH upstream examples with relay_ip/target_ip pinning
2026-04-22 17:20:19 +03:00
Razvan Dimescu
4fdd05f284 Merge pull request #132 from razvandimescu/chore/site-live-reload
chore(site): live-reload dev server
2026-04-22 17:17:37 +03:00
Razvan Dimescu
2e461ccc0f docs(config): add ODoH upstream examples with relay_ip/target_ip pinning
Complements the bootstrap resolver fix (#122, #126) by documenting the
ODoH knobs in the commented config template. Explains relay_ip/target_ip
as the way to prevent plain-DNS leaks of the relay/target hostnames via
the bootstrap resolver on cold boot when numa is its own system DNS.
2026-04-22 17:13:13 +03:00
Razvan Dimescu
bf84c44346 Merge pull request #133 from razvandimescu/chore/cargo-audit-rustls-webpki
chore: bump rustls-webpki to 0.103.13 (RUSTSEC-2026-0104)
2026-04-22 17:03:58 +03:00
Razvan Dimescu
df2062882c chore: bump rustls-webpki to 0.103.13 for RUSTSEC-2026-0104
Advisory published 2026-04-22: reachable panic in certificate revocation
list parsing. Patch is a lockfile-only bump — transitive via rustls, no
direct dep changes. Unblocks cargo audit in CI across all open PRs.
2026-04-22 16:42:10 +03:00
Razvan Dimescu
76dda89078 Merge pull request #129 from razvandimescu/chore/gitignore-claude
chore: gitignore .claude/ harness state
2026-04-22 16:39:56 +03:00
Razvan Dimescu
640b64bf7e chore(site): live-reload dev server via chokidar + browser-sync
Replaces the plain python3 http.server + one-shot make blog with a
watcher pipeline: chokidar regenerates HTML on MD/template changes,
browser-sync serves the site and reloads the browser on rendered-asset
changes. First run downloads both via npx; subsequent runs are instant.

Preflight checks for npx and pandoc. Port arg parsing is tolerant of
legacy --drafts flag ordering (drafts are always included now, since
that's what the dev loop actually wants).

Cleanup trap kills the watcher on exit so re-runs don't leave orphans.
2026-04-22 15:50:21 +03:00
Razvan Dimescu
5ba19e04c8 chore: gitignore local Claude Code harness state
.claude/ holds per-session harness files (settings.local.json, task
locks, worktree metadata). None of it belongs in the repo.
2026-04-22 15:49:58 +03:00
Razvan Dimescu
c98afafaa1 Merge pull request #127 from razvandimescu/refactor/bootstrap-btreemap
refactor(bootstrap): BTreeMap for overrides + simplify review
2026-04-21 18:41:49 +03:00
Razvan Dimescu
5cba02a6c8 refactor(bootstrap): BTreeMap for overrides + simplify review
- Switch overrides from HashMap to BTreeMap — deterministic iteration by
  type, drops the manual sort when logging.
- Rename the flat_map closure's inner `ips` to `addrs` to stop shadowing
  the outer Vec<String>.
- Trim the Suite 8 TEST-NET-1 comment to keep the "why" and drop
  mechanism narration.
- Drop a redundant sleep 1 after wait — wait already blocks on exit.
2026-04-21 18:37:35 +03:00
Razvan Dimescu
46a95d58aa Merge pull request #126 from razvandimescu/fix/self-resolver-loop
fix(bootstrap): route numa HTTPS via IP-literal bootstrap resolver (#122)
2026-04-21 17:52:51 +03:00
Razvan Dimescu
51cce0347b test(odoh): integration-verify relay_ip/target_ip override wiring
Suite 8 now ends with a config using RFC 5737 TEST-NET-1 IPs as
relay_ip/target_ip, started briefly so the bootstrap resolver logs its
override map. Asserts both host=IP pairs land in that map — closing the
gap flagged on PR #126 (zero-plain-DNS-leak for ODoH endpoints was only
unit-tested).

Also: NumaResolver::new now logs the override map at INFO when non-empty,
so operators can verify their ODoH bootstrap without needing DEBUG level.
2026-04-21 17:43:02 +03:00
Razvan Dimescu
459395203d style: cargo fmt 2026-04-21 16:30:26 +03:00
Razvan Dimescu
10469e96bd fix(bootstrap): route numa HTTPS via IP-literal bootstrap resolver (#122)
When numa is its own system DNS resolver (HAOS add-on, Pi-hole-style
container, /etc/resolv.conf → 127.0.0.1), every numa-originated HTTPS
connection — DoH upstream, ODoH relay/target, blocklist CDN — routed
its hostname through getaddrinfo() back to numa itself. Cold boot
deadlocked; steady state taxed every new TCP connection. 0.14.1's
retry-with-backoff masked the startup race but not the underlying
self-loop.

NumaResolver implements reqwest::dns::Resolve with two lanes:
- Per-host overrides (ODoH relay_ip/target_ip) short-circuit DNS
  entirely, preserving ODoH's zero-plain-DNS-leak property.
- Otherwise: A+AAAA in parallel via UDP to IP-literal bootstrap
  servers, with TCP fallback for UDP-hostile networks.

Bootstrap IPs come from upstream.fallback (IP-literal filtered,
hostnames skipped with a warning). Empty fallback yields the
hardcoded default [9.9.9.9, 1.1.1.1]; the chosen source is logged
at startup so the silent default is visible.

doh_keepalive_loop now fires its first tick immediately, and
keepalive_doh logs failures at WARN — bootstrap issues surface
within ~100ms of boot instead of on the first client query.

Distinct from UpstreamPool.fallback (client-query failover) which
stays untouched: client queries with no configured fallback still
SERVFAIL on primary failure rather than silently shadow-routing.

Reproducer: tests/docker/self-resolver-loop.sh. Before: 0 blocklist
domains, 3072ms SERVFAIL. After: 397k domains, 118ms NOERROR.
2026-04-21 16:19:14 +03:00
Razvan Dimescu
31adc31c9b refactor(ctx): coalesce forward-path upstream queries
resolve_coalesced now takes leader_path: QueryPath and applies to all
three upstream branches (Forwarded-rule, Recursive, Upstream), not just
Recursive. Fixes thundering-herd at boot when N concurrent HTTPS setups
each trigger independent forward queries for the same upstream hostname.
2026-04-21 16:18:52 +03:00
Razvan Dimescu
60600b045f chore: bump version to 0.14.1 v0.14.1 2026-04-20 19:27:06 +03:00
Razvan Dimescu
3e6bf3feb0 Merge pull request #125 from razvandimescu/worktree-fix-blocklist-bootstrap
fix(blocklist): retry on transient download failures (#122)
2026-04-20 19:22:04 +03:00
Razvan Dimescu
8bed7c4649 test(blocklist): decouple retry tests from RETRY_DELAYS_SECS length
Derive both the flaky-server drop count and the zero-delay schedule
from RETRY_DELAYS_SECS.len() so the tests keep exercising their
intended invariants — "succeeds on final attempt" and "gives up after
all attempts fail" — if the production retry schedule ever changes.

Also: rename fail_first → drop_first_n to match drop(sock); swap the
giveup test's empty body for an "unreachable" sentinel so a regression
that accidentally served couldn't silently match Some("").
2026-04-20 19:19:43 +03:00
Razvan Dimescu
5b1642c6dc fix(blocklist): retry on transient download failures (#122)
On cold start, reqwest's getaddrinfo can race numa's own first-query
cold-path latency — resolver timeout fires before numa warms its
upstream DoH connection. Wrap each blocklist fetch in 3 retries with
2s/10s/30s backoff; by the second attempt, the upstream is warm and
subsequent getaddrinfos succeed in <100ms.

Also: parallelize fetches across lists via join_all (different hosts,
no warming dependency), walk the full error source chain so reqwest
failures surface the underlying cause, and parameterize retry delays
for unit-test speed.
2026-04-20 19:19:43 +03:00
Razvan Dimescu
01fda7891e Merge pull request #123 from razvandimescu/feat/odoh-etld1-check
feat(odoh): reject relay+target sharing an eTLD+1
2026-04-20 19:06:12 +03:00
Razvan Dimescu
5e84adbd94 Merge pull request #124 from razvandimescu/fix/dashboard-encryption-pct-args
fix(dashboard): pass missing args to encryptionPct in refresh()
2026-04-20 19:05:50 +03:00
Razvan Dimescu
15978a7859 fix(dashboard): pass missing args to encryptionPct in refresh()
Commit eb5ea3b generalised encryptionPct from (transport) to
(data, encryptedKeys, allKeys) and updated renderTransport and
renderUpstreamWire, but missed the call inside render() that computes
the inline `~N/s · M% enc` QPS tag. With undefined allKeys, the
first .reduce() threw TypeError and the render try/catch silently
downgraded the whole dashboard to "disconnected" — every panel left
empty even though /stats was returning real data.

Fix the call site to match the other two (inbound-wire keys) and have
the catch log to console so the next silent-failure regression shows
up in DevTools within seconds instead of a source dive.
2026-04-20 19:04:15 +03:00
Razvan Dimescu
193b38b85f feat(odoh): reject relay+target sharing an eTLD+1
Plain host-string equality caught the copy-paste-same-URL footgun but
let `r.cloudflare.com` + `odoh.cloudflare.com` through — two subdomains
of the same operator collapse ODoH to ordinary DoH. Add a second layer:
compare registrable domains via the PSL (`psl` crate) after the exact-
host check. Fails open on IP literals and unparseable hosts; the exact-
host check still runs in those cases.
2026-04-20 18:46:54 +03:00
Razvan Dimescu
4c685d1602 docs(readme): pamper readme still 2026-04-20 17:19:16 +03:00
Razvan Dimescu
cd6e686a1a docs(readme): surface ODoH in the intro paragraph
Adds the v0.14.0 capability where it's most differentiating: the first
paragraph (sealed-query framing alongside the existing ad-blocking and
.numa-domain pitches) and the second paragraph (numa relay as a public
ODoH endpoint, with the DNSCrypt-list supply-doubling angle as fact).

No reposition: tagline and structure unchanged. ODoH joins the
existing capability set rather than displacing it. Hero GIF stays;
will be re-recorded once the dashboard's Outbound Wire panel is worth
showing in motion.
2026-04-20 17:14:21 +03:00
Razvan Dimescu
07c321f749 chore(release): bump to v0.14.0
Headline: ODoH (RFC 9230) — client + self-hosted relay. Set
mode = "odoh" in [upstream] to seal queries before they leave the
machine; run `numa relay` to add to the public ODoH ecosystem.
v0.14.0
2026-04-20 17:07:31 +03:00
Razvan Dimescu
12a06a1410 Merge pull request #121 from razvandimescu/feat/odoh
feat(odoh): ship ODoH client + self-hosted relay (RFC 9230)
2026-04-20 16:26:54 +03:00
Razvan Dimescu
eb5ea3b645 refactor(odoh): deduplicate post-audit findings
- Hoist ODOH_CONTENT_TYPE to a single pub(crate) constant in odoh.rs;
  relay.rs imports it instead of declaring its own.
- Generalize dashboard encryptionPct(data, encryptedKeys, allKeys)
  so both Inbound Wire and Outbound Wire panels share the same math
  instead of drifting independently.
- Extract RelayState::new() and build_app() helpers in relay.rs so
  the test spawn_relay() and production run() wire the same router
  + body-limit layer. Prevents future middleware from landing in one
  path but not the other.

All 344 lib tests pass; no behavior change.
2026-04-20 16:03:34 +03:00
Razvan Dimescu
be60f6ccbc chore(packaging): docker-compose + Caddyfile for ODoH relay deploy
Two-container deploy: Caddy terminates TLS (auto-provisions Let's
Encrypt via ACME) and reverse-proxies to a Numa relay on an internal
Docker network. The relay never reads sealed payloads; Caddy's
access log is discarded so per-request observability doesn't defeat
the oblivious property.

Validated against Hetzner CX22 + DNS at odoh-relay.numa.rs:
- TLS-ALPN-01 challenge succeeded on first attempt
- /health returned the relay's counter block
- End-to-end ODoH client → relay → Cloudflare works

Operators only need to: set a DNS A record, edit Caddyfile's hostname,
docker compose up -d. README walks through the steps and the DNSCrypt
v3/odoh-relays.md submission to claim a public listing.
2026-04-20 15:44:29 +03:00
Razvan Dimescu
a3cc64c94f feat(odoh): relay bind-address CLI arg + dashboard Outbound Wire panel
- `numa relay [PORT] [BIND]` accepts an optional bind address (defaults
  to 127.0.0.1, matching the Caddy reverse-proxy deployment shape).
  Required for Docker, where the relay needs 0.0.0.0 inside the
  container so Caddy can reach it across the bridge network.

- Dashboard now surfaces the upstream_transport dimension as an
  "Outbound Wire" panel alongside the existing "Inbound Wire" (renamed
  from "Transport" for directional clarity). Sub-headers — "apps → numa"
  / "numa → internet" — make the threat-model split obvious without
  jargon. Bars: UDP/DoH/DoT/ODoH, headline "X% encrypted outbound".
  The PR description's promise that "the dashboard answers how much of
  my DNS traffic left in cleartext honestly" is now true.
2026-04-20 15:44:20 +03:00
Razvan Dimescu
cf128c19af feat(odoh): bootstrap-IP overrides + zero hedge for ODoH (post-deploy fixes)
Two issues surfaced from running mode = "odoh" against the live Hetzner
relay as system DNS:

1. **Bootstrap deadlock.** The reqwest HTTPS client resolves the relay
   and target hostnames via system DNS. When numa is itself the system
   resolver, the ODoH client loops trying to resolve through itself.
   Adds optional `relay_ip` and `target_ip` to `[upstream]`, plumbed
   into reqwest's `resolve()` so the HTTPS client bypasses system DNS
   for those two hostnames. TLS still validates against the URL
   hostname, so a stale IP fails loudly rather than silently MITM'ing.

2. **2x relay load.** Default `hedge_ms = 10` triggers a duplicate
   in-flight query for every request. Useful for UDP/DoH/DoT (rescues
   tail latency cheaply); wasteful for ODoH (doubles HPKE seal/unseal,
   doubles sealed-byte footprint a passive observer can correlate, no
   latency win — relay hop dominates either way). Force-zero in
   oblivious mode regardless of configured hedge_ms.

Validated end-to-end against odoh-relay.numa.rs → Cloudflare:
3 digs produced 3 forwarded_ok on the relay (was 6 before the hedge
fix), upstream_transport.odoh ticks correctly.
2026-04-20 15:44:09 +03:00
Razvan Dimescu
241c40553b feat(odoh): ship ODoH client + self-hosted relay (RFC 9230)
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.

Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.

Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.

Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
2026-04-20 12:34:04 +03:00
Razvan Dimescu
f6cfb3ce1b Merge pull request #120 from razvandimescu/feat/named-record-types
feat(question): name SVCB/LOC/NAPTR record types in logs
2026-04-19 08:08:54 +03:00
Razvan Dimescu
5725f94ff3 refactor(question): collapse QueryType impls behind define_qtypes! macro
Adding a record type used to require 5 edits across the file (enum
variant, to_num, from_num, as_str, parse_str). The macro takes a
single (variant, num, str) tuple per type and generates the enum
plus all four methods.

UNKNOWN(u16) stays hand-coded since it carries data and can't sit
in the table.

src/question.rs: 156 lines -> 92 lines, no behavior change.
2026-04-19 08:01:18 +03:00
Razvan Dimescu
24610ae3fe feat(question): add SVCB, LOC, NAPTR variants to QueryType
Logs were printing UNKNOWN(64), UNKNOWN(29), UNKNOWN(35) for SVCB,
LOC, and NAPTR — three RR types that have been registered for years
and show up in the wild (notably SVCB via RFC 9462 DDR clients
querying _dns.resolver.arpa).

Adds the variants and replaces the SVCB_QTYPE u16 const introduced
in #119 with QueryType::SVCB.to_num(), matching the HTTPS path.

Closes #114.
2026-04-19 07:49:35 +03:00
Razvan Dimescu
6bc02982f0 Merge pull request #119 from razvandimescu/feat/filter-aaaa
feat(resolver): filter_aaaa for IPv4-only networks
2026-04-19 07:31:27 +03:00