151 Commits

Author SHA1 Message Date
Razvan Dimescu
60600b045f chore: bump version to 0.14.1 2026-04-20 19:27:06 +03:00
Razvan Dimescu
3e6bf3feb0 Merge pull request #125 from razvandimescu/worktree-fix-blocklist-bootstrap
fix(blocklist): retry on transient download failures (#122)
2026-04-20 19:22:04 +03:00
Razvan Dimescu
8bed7c4649 test(blocklist): decouple retry tests from RETRY_DELAYS_SECS length
Derive both the flaky-server drop count and the zero-delay schedule
from RETRY_DELAYS_SECS.len() so the tests keep exercising their
intended invariants — "succeeds on final attempt" and "gives up after
all attempts fail" — if the production retry schedule ever changes.

Also: rename fail_first → drop_first_n to match drop(sock); swap the
giveup test's empty body for an "unreachable" sentinel so a regression
that accidentally served couldn't silently match Some("").
2026-04-20 19:19:43 +03:00
Razvan Dimescu
5b1642c6dc fix(blocklist): retry on transient download failures (#122)
On cold start, reqwest's getaddrinfo can race numa's own first-query
cold-path latency — resolver timeout fires before numa warms its
upstream DoH connection. Wrap each blocklist fetch in 3 retries with
2s/10s/30s backoff; by the second attempt, the upstream is warm and
subsequent getaddrinfos succeed in <100ms.

Also: parallelize fetches across lists via join_all (different hosts,
no warming dependency), walk the full error source chain so reqwest
failures surface the underlying cause, and parameterize retry delays
for unit-test speed.
2026-04-20 19:19:43 +03:00
Razvan Dimescu
01fda7891e Merge pull request #123 from razvandimescu/feat/odoh-etld1-check
feat(odoh): reject relay+target sharing an eTLD+1
2026-04-20 19:06:12 +03:00
Razvan Dimescu
5e84adbd94 Merge pull request #124 from razvandimescu/fix/dashboard-encryption-pct-args
fix(dashboard): pass missing args to encryptionPct in refresh()
2026-04-20 19:05:50 +03:00
Razvan Dimescu
15978a7859 fix(dashboard): pass missing args to encryptionPct in refresh()
Commit eb5ea3b generalised encryptionPct from (transport) to
(data, encryptedKeys, allKeys) and updated renderTransport and
renderUpstreamWire, but missed the call inside render() that computes
the inline `~N/s · M% enc` QPS tag. With undefined allKeys, the
first .reduce() threw TypeError and the render try/catch silently
downgraded the whole dashboard to "disconnected" — every panel left
empty even though /stats was returning real data.

Fix the call site to match the other two (inbound-wire keys) and have
the catch log to console so the next silent-failure regression shows
up in DevTools within seconds instead of a source dive.
2026-04-20 19:04:15 +03:00
Razvan Dimescu
193b38b85f feat(odoh): reject relay+target sharing an eTLD+1
Plain host-string equality caught the copy-paste-same-URL footgun but
let `r.cloudflare.com` + `odoh.cloudflare.com` through — two subdomains
of the same operator collapse ODoH to ordinary DoH. Add a second layer:
compare registrable domains via the PSL (`psl` crate) after the exact-
host check. Fails open on IP literals and unparseable hosts; the exact-
host check still runs in those cases.
2026-04-20 18:46:54 +03:00
Razvan Dimescu
4c685d1602 docs(readme): pamper readme still 2026-04-20 17:19:16 +03:00
Razvan Dimescu
cd6e686a1a docs(readme): surface ODoH in the intro paragraph
Adds the v0.14.0 capability where it's most differentiating: the first
paragraph (sealed-query framing alongside the existing ad-blocking and
.numa-domain pitches) and the second paragraph (numa relay as a public
ODoH endpoint, with the DNSCrypt-list supply-doubling angle as fact).

No reposition: tagline and structure unchanged. ODoH joins the
existing capability set rather than displacing it. Hero GIF stays;
will be re-recorded once the dashboard's Outbound Wire panel is worth
showing in motion.
2026-04-20 17:14:21 +03:00
Razvan Dimescu
07c321f749 chore(release): bump to v0.14.0
Headline: ODoH (RFC 9230) — client + self-hosted relay. Set
mode = "odoh" in [upstream] to seal queries before they leave the
machine; run `numa relay` to add to the public ODoH ecosystem.
2026-04-20 17:07:31 +03:00
Razvan Dimescu
12a06a1410 Merge pull request #121 from razvandimescu/feat/odoh
feat(odoh): ship ODoH client + self-hosted relay (RFC 9230)
2026-04-20 16:26:54 +03:00
Razvan Dimescu
eb5ea3b645 refactor(odoh): deduplicate post-audit findings
- Hoist ODOH_CONTENT_TYPE to a single pub(crate) constant in odoh.rs;
  relay.rs imports it instead of declaring its own.
- Generalize dashboard encryptionPct(data, encryptedKeys, allKeys)
  so both Inbound Wire and Outbound Wire panels share the same math
  instead of drifting independently.
- Extract RelayState::new() and build_app() helpers in relay.rs so
  the test spawn_relay() and production run() wire the same router
  + body-limit layer. Prevents future middleware from landing in one
  path but not the other.

All 344 lib tests pass; no behavior change.
2026-04-20 16:03:34 +03:00
Razvan Dimescu
be60f6ccbc chore(packaging): docker-compose + Caddyfile for ODoH relay deploy
Two-container deploy: Caddy terminates TLS (auto-provisions Let's
Encrypt via ACME) and reverse-proxies to a Numa relay on an internal
Docker network. The relay never reads sealed payloads; Caddy's
access log is discarded so per-request observability doesn't defeat
the oblivious property.

Validated against Hetzner CX22 + DNS at odoh-relay.numa.rs:
- TLS-ALPN-01 challenge succeeded on first attempt
- /health returned the relay's counter block
- End-to-end ODoH client → relay → Cloudflare works

Operators only need to: set a DNS A record, edit Caddyfile's hostname,
docker compose up -d. README walks through the steps and the DNSCrypt
v3/odoh-relays.md submission to claim a public listing.
2026-04-20 15:44:29 +03:00
Razvan Dimescu
a3cc64c94f feat(odoh): relay bind-address CLI arg + dashboard Outbound Wire panel
- `numa relay [PORT] [BIND]` accepts an optional bind address (defaults
  to 127.0.0.1, matching the Caddy reverse-proxy deployment shape).
  Required for Docker, where the relay needs 0.0.0.0 inside the
  container so Caddy can reach it across the bridge network.

- Dashboard now surfaces the upstream_transport dimension as an
  "Outbound Wire" panel alongside the existing "Inbound Wire" (renamed
  from "Transport" for directional clarity). Sub-headers — "apps → numa"
  / "numa → internet" — make the threat-model split obvious without
  jargon. Bars: UDP/DoH/DoT/ODoH, headline "X% encrypted outbound".
  The PR description's promise that "the dashboard answers how much of
  my DNS traffic left in cleartext honestly" is now true.
2026-04-20 15:44:20 +03:00
Razvan Dimescu
cf128c19af feat(odoh): bootstrap-IP overrides + zero hedge for ODoH (post-deploy fixes)
Two issues surfaced from running mode = "odoh" against the live Hetzner
relay as system DNS:

1. **Bootstrap deadlock.** The reqwest HTTPS client resolves the relay
   and target hostnames via system DNS. When numa is itself the system
   resolver, the ODoH client loops trying to resolve through itself.
   Adds optional `relay_ip` and `target_ip` to `[upstream]`, plumbed
   into reqwest's `resolve()` so the HTTPS client bypasses system DNS
   for those two hostnames. TLS still validates against the URL
   hostname, so a stale IP fails loudly rather than silently MITM'ing.

2. **2x relay load.** Default `hedge_ms = 10` triggers a duplicate
   in-flight query for every request. Useful for UDP/DoH/DoT (rescues
   tail latency cheaply); wasteful for ODoH (doubles HPKE seal/unseal,
   doubles sealed-byte footprint a passive observer can correlate, no
   latency win — relay hop dominates either way). Force-zero in
   oblivious mode regardless of configured hedge_ms.

Validated end-to-end against odoh-relay.numa.rs → Cloudflare:
3 digs produced 3 forwarded_ok on the relay (was 6 before the hedge
fix), upstream_transport.odoh ticks correctly.
2026-04-20 15:44:09 +03:00
Razvan Dimescu
241c40553b feat(odoh): ship ODoH client + self-hosted relay (RFC 9230)
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.

Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.

Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.

Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
2026-04-20 12:34:04 +03:00
Razvan Dimescu
f6cfb3ce1b Merge pull request #120 from razvandimescu/feat/named-record-types
feat(question): name SVCB/LOC/NAPTR record types in logs
2026-04-19 08:08:54 +03:00
Razvan Dimescu
5725f94ff3 refactor(question): collapse QueryType impls behind define_qtypes! macro
Adding a record type used to require 5 edits across the file (enum
variant, to_num, from_num, as_str, parse_str). The macro takes a
single (variant, num, str) tuple per type and generates the enum
plus all four methods.

UNKNOWN(u16) stays hand-coded since it carries data and can't sit
in the table.

src/question.rs: 156 lines -> 92 lines, no behavior change.
2026-04-19 08:01:18 +03:00
Razvan Dimescu
24610ae3fe feat(question): add SVCB, LOC, NAPTR variants to QueryType
Logs were printing UNKNOWN(64), UNKNOWN(29), UNKNOWN(35) for SVCB,
LOC, and NAPTR — three RR types that have been registered for years
and show up in the wild (notably SVCB via RFC 9462 DDR clients
querying _dns.resolver.arpa).

Adds the variants and replaces the SVCB_QTYPE u16 const introduced
in #119 with QueryType::SVCB.to_num(), matching the HTTPS path.

Closes #114.
2026-04-19 07:49:35 +03:00
Razvan Dimescu
6bc02982f0 Merge pull request #119 from razvandimescu/feat/filter-aaaa
feat(resolver): filter_aaaa for IPv4-only networks
2026-04-19 07:31:27 +03:00
Razvan Dimescu
f9e996ae78 fmt: drop redundant comments per house style 2026-04-19 06:53:47 +03:00
Razvan Dimescu
5e85b147b9 feat(resolver): apply ipv6hint strip to SVCB (type 64) too
HTTPS (65) and SVCB (64) share the RDATA wire format, so the existing
parser already handles both — only the call site was HTTPS-only. Widen
the qtype check and extend the existing pipeline test with a second
query for SVCB.
2026-04-19 06:52:30 +03:00
Razvan Dimescu
d6bb9a0f01 fmt: rustfmt vec literal wrapping + signature collapse 2026-04-19 06:24:54 +03:00
Razvan Dimescu
61ea2e510d refactor: dedupe HTTPS_TYPE, record-walk, and test rdata builder
- Drop `const HTTPS_TYPE: u16 = 65;` in favor of `QueryType::HTTPS.to_num()`
  at the single call site — avoids a fresh magic number alongside the
  existing enum mapping in question.rs.
- Add `DnsPacket::for_each_record_mut` so `strip_https_ipv6_hints` stops
  hand-rolling the answers/authorities/resources walk; future section
  rewrites go through the same helper.
- Promote the SVCB test-rdata builder from `svcb::tests` to module scope
  as `pub(crate) #[cfg(test)] fn build_rdata`, and reuse it in the two
  pipeline tests in ctx.rs — kills ~20 lines of byte-fiddling and keeps
  one RDATA-construction code path.
2026-04-19 05:58:47 +03:00
Razvan Dimescu
22dd3cd222 fix(resolver): skip ipv6hint strip for DO-bit clients
Modifying HTTPS rdata invalidates any accompanying RRSIG, so a DNSSEC-
validating downstream would reject the response as Bogus. Gate the
strip on !client_do, matching the existing DNSSEC-records strip.

Adds a regression test that catches the gate being removed: builds a
query with EDNS DO=1, asserts the HTTPS rdata round-trips untouched.
2026-04-19 05:52:37 +03:00
Razvan Dimescu
8014ebac9e test(integration): add Suite 7 for filter_aaaa + SUITES env filter
Suite 7 exercises the full pipeline end-to-end: A resolves, AAAA returns
NODATA, local [[zones]] AAAA bypasses the filter, and HTTPS ipv6hint is
stripped from a real cloudflare.com response. A second config run with
the flag unset guards against network-failure false-positives.

SUITES=N (comma list) runs a subset, e.g. `SUITES=7 bash tests/integration.sh`
skips suites 1-6 for fast iteration.
2026-04-19 05:52:29 +03:00
Razvan Dimescu
70400187d0 Merge pull request #118 from razvandimescu/feat/linux-drop-privileges
feat(linux): run systemd service as unprivileged numa user
2026-04-18 22:04:53 +03:00
Razvan Dimescu
fb41a6f8b5 test(linux): systemd service install verification
Three scenarios CI cannot run: every advertised port is functional (DNS
resolves, TLS chain validates against numa's CA, HTTP/API respond), CA
fingerprint survives upgrade from pre-drop layout, binary staging
fallback from a 0700 source dir. Self-bootstraps a privileged
systemd-as-PID1 container — no dependency on long-lived test containers.

MainPID user assertion retries until comm=numa to avoid a race where
systemctl reports active while MainPID still points at a transitional
process.
2026-04-18 22:00:54 +03:00
Razvan Dimescu
b02b607fb9 ci(linux): assert numa daemon does not run as root
Locks in the invariant this branch establishes: a regression that
reverts to User=root would otherwise ship green.
2026-04-18 20:07:24 +03:00
Razvan Dimescu
be98a02e49 feat(resolver): filter_aaaa for IPv4-only networks (#112)
When enabled, AAAA queries short-circuit to NODATA (NOERROR + empty
answer) so Happy Eyeballs clients don't stall waiting on a v6 address
they can't use. Also strips `ipv6hint` SvcParam from HTTPS/SVCB
answers (RFC 9460) so Chrome ≥103, Firefox, and Safari don't bypass
the AAAA filter via the HTTPS record path.

Local data is preserved: overrides, zones, the .numa proxy, and the
blocklist sinkhole keep whatever v6 addresses they configure — the
filter only kicks in on the cache/forward/recursive path. NODATA is
correct per RFC 2308 here; NXDOMAIN would incorrectly imply the name
doesn't exist for A queries either.

Off by default. Opt in via `filter_aaaa = true` under `[server]`.
2026-04-18 19:52:06 +03:00
Razvan Dimescu
763131478f fmt: rustfmt format! macro split 2026-04-18 12:15:44 +03:00
Razvan Dimescu
067195f2ab fix(linux): atomic binary copy + restart instead of start on re-install
Re-install failed with ETXTBSY (Text file busy) because std::fs::copy
can't overwrite a binary that's currently being executed by the
running service. Switch to copy-then-rename: write the new binary to
/usr/local/bin/numa.new, then rename over /usr/local/bin/numa. Rename
swaps the path while the running process keeps the old inode alive,
so DNS keeps serving from the previous binary until restart.

Bump systemctl start to restart so the new binary actually loads on
re-install (start is a no-op when the unit is already active, which
would silently leave the old binary running).

Locally verified the full CI sequence: install → curl → reinstall →
curl → uninstall → curl-fails. All three assertions pass.
2026-04-18 12:12:11 +03:00
Razvan Dimescu
e19505aa95 fix(linux): narrow replace_exe_path cfg to macos after Linux inlined the substitution
Linux install_service_linux now does the {{exe_path}} substitution
inline because it uses the (potentially copied) binary path returned
by install_service_binary_linux, not current_exe(). The shared
replace_exe_path helper is dead on Linux — clippy -D warnings caught it.

Narrow the function to macos and split the placeholder test: keep the
"both templates contain {{exe_path}}" assertion as a cross-platform test
(catches placeholder removal on either file), keep the substitution test
gated to macos where the function lives.
2026-04-18 11:57:54 +03:00
Razvan Dimescu
3970a9f45c fix(linux): copy binary to /usr/local/bin when source path isn't world-traversable
DynamicUser=yes' transient account can only traverse world-x directories.
The CI binary at /home/runner/work/numa/numa/target/release/numa fails
exec with EACCES because /home/runner is mode 0700; same applies to a
build under /home/<user>/, ~/.cargo/bin, or any private $HOME tree.

install_service_binary_linux now walks the binary's path. If every
ancestor grants world-execute (Linuxbrew /home/linuxbrew is 0755,
/usr/local/bin is fine, install.sh layout works), keep the source
path so brew/distro upgrades propagate in place. Otherwise copy to
/usr/local/bin/numa and reference that in the unit.

Locally verified both branches in an Ubuntu 24.04 systemd container:
- CI-like /home/runner (0700) → copies + service binds 5380
- Brew-like /home/linuxbrew (0755) → keeps source path + service binds 5380
2026-04-18 11:51:32 +03:00
Razvan Dimescu
7b9db9e889 fix(linux): drop ProtectHome=true — blocks exec when binary lives under /home
Integration-linux journalctl showed status=203/EXEC: systemd couldn't
exec /home/runner/work/numa/numa/target/release/numa because
ProtectHome=yes makes /home invisible to the sandboxed process. My
local Docker test passed because the binary was at /workspace, not
/home.

DynamicUser=yes already implies ProtectHome=read-only, which preserves
exec access to binaries living under /home (cargo install, source
builds, CI) while blocking writes to user $HOMEs. Keep that default
rather than over-restricting.

Follow-up worth tracking: install_service_linux could copy the binary
to /usr/local/bin/numa the way Windows does at windows_service_exe_path,
making the unit's ExecStart independent of where `numa install` was
invoked from — then we could set ProtectHome=yes again.
2026-04-18 08:54:34 +03:00
Razvan Dimescu
dfeca53e21 ci: dump journalctl + systemctl status on integration-linux failure 2026-04-18 08:48:53 +03:00
Razvan Dimescu
4f6159d961 refactor(linux): switch to DynamicUser=yes, drop install-time user creation
AUR installs never call `numa install` — PKGBUILD drops the unit straight
into /usr/lib/systemd/system and the user runs `systemctl enable numa`.
With User=numa the Rust installer's useradd code never fires there,
breaking Arch out of the box.

DynamicUser=yes sidesteps packaging entirely — systemd allocates a
transient UID per start and remaps StateDirectory ownership (including
legacy root-owned trees) automatically. Works on any modern systemd.

Drops the ensure_numa_user_linux/chown helpers plus NUMA_USER; the
unit file alone now captures the privilege-drop story.
2026-04-18 08:20:07 +03:00
Razvan Dimescu
41aea1dd12 fix(linux): drop risky sandbox directives that break Rust network daemons
Integration test failed with exit 7 on curl to /health after a successful
install — service started but never listened. The likely culprits are
MemoryDenyWriteExecute (breaks jemalloc/some crypto), SystemCallFilter
~@privileged @resources (blocks setrlimit and friends tokio may use),
and RestrictNamespaces/LockPersonality (occasional foot-guns).

Pull them and keep a conservative hardening set that's well-tested with
Rust network services: no-new-privs, protect-system/home, private tmp
and devices, protect-kernel-*, restrict-realtime/suid/address-families.
Layer the aggressive bits back in follow-up PRs once tested individually.
2026-04-18 08:10:04 +03:00
Razvan Dimescu
695a8b963c feat(linux): run systemd service as unprivileged numa user
- numa.service: User=numa + CAP_NET_BIND_SERVICE + sandboxing block
  (ProtectSystem=strict, PrivateTmp, seccomp @system-service, etc)
- install_service_linux: create numa system user + chown data_dir
  before first start so TLS-cert generation and state writes land
  on a numa-owned tree

Runtime verified root-free on Linux — network_watch_loop only reads
/etc/resolv.conf; all system-DNS mutation stays in the installer,
which continues to run as root via sudo.
2026-04-18 07:56:59 +03:00
Razvan Dimescu
34e2182ae4 Merge pull request #104 from razvandimescu/feat/forwarding-array-upstream
feat: accept array of upstreams in [[forwarding]]
2026-04-17 23:25:04 +03:00
Razvan Dimescu
5f77af55e9 fix(forward): track SRTT for DoT upstreams, not just UDP
The SRTT ordering + failure penalty path was UDP-only, so a DoT primary
in a forwarding-rule pool was never deprioritized on failure and all
DoT entries tied at INITIAL_SRTT_MS in the sort key. With [[forwarding]]
now accepting arrays of upstreams, DoT pools are a first-class case and
need the same healthiest-first behavior the default pool gets for UDP.

- Add Upstream::tracked_ip() → Some(ip) for Udp/Dot, None for Doh
  (DoH has no stable IP — reqwest pools connections by hostname).
- Rewire the three SRTT call sites in forward_with_failover_raw.
- Hoist srtt.read() out of the candidate-scoring loop — one lock per
  query instead of N (matters now that pools commonly have N>1).
- Drop unused #[derive(Debug)] on UpstreamPool and ForwardingRule.
- Regression tests: udp_failure_records_in_srtt + dot_failure_records_in_srtt.
2026-04-17 03:39:21 +03:00
Razvan Dimescu
ab6cda0c91 Merge branch 'main' into feat/forwarding-array-upstream
Resolves src/main.rs conflict: serve loop was extracted into src/serve.rs on main (PR #107). Ported the forwarding-rule log change to serve.rs — fwd.upstream is now Vec<String>, logged with join(", ").
2026-04-17 03:14:09 +03:00
Razvan Dimescu
f9ce82f4b0 Merge pull request #107 from razvandimescu/feat/windows-service
feat(windows): run as a real SCM service, not a Run-key autostart
2026-04-17 02:02:43 +03:00
Razvan Dimescu
1d9495c013 ci: bridge DNS gap with direct upstream instead of polling
systemd-resolved has a ~40s reconfiguration stall after restart
(systemd #22521) that breaks the GHA runner's persistent connection
to results-receiver.actions.githubusercontent.com. Polling for DNS
recovery isn't enough since the .NET runner agent caches DNS at the
connection-pool level. Replace the broken stub-resolv symlink with a
direct upstream so DNS works instantly.
2026-04-17 01:32:36 +03:00
Razvan Dimescu
34b75833b8 ci: poll for DNS recovery in cleanup, not test step
Move DNS recovery wait into the cleanup step (if: always) so it runs
regardless of test outcome. Use getent hosts loop instead of sleep+dig
to match what post-steps actually use for resolution.
2026-04-17 01:11:20 +03:00
Razvan Dimescu
99af97a67b ci: wait for DNS recovery after uninstall on Linux
systemd-resolved needs a moment to restore its stub listener after
the numa drop-in is removed. Without a wait, the runner can't resolve
GitHub's API to report job completion.
2026-04-16 20:20:53 +03:00
Razvan Dimescu
9e56054f37 ci: add integration tests for install/uninstall lifecycle
Release-build + install/verify/re-install/uninstall cycle on Linux and
macOS. Runs after lint/test passes (needs dependency). Cleanup step
uses if: always() to handle cancellation.
2026-04-16 19:56:44 +03:00
Razvan Dimescu
fe9f31616e test: add SCM output parsing and config path regression tests
Extract parse_sc_registered and parse_sc_state as testable pure
functions. 8 new tests covering: service registration detection,
service state parsing, and Windows config_dir == data_dir invariant.
2026-04-16 19:31:26 +03:00
Razvan Dimescu
9f08d8b489 fix(windows): stop service before port probe, wait for full exit
Stop the running service before disabling Dnscache so the port 53 probe
sees the real state (not Numa's own binding). Wait for SCM STOPPED
state before copying the binary to avoid os error 32 (file in use).
2026-04-16 19:21:56 +03:00
Razvan Dimescu
9bea038cb6 fix(windows): unify config/data dir and add service log file
config_dir() on Windows now returns data_dir() (ProgramData) so config,
services.json, and log file are in the same place for both interactive
and service contexts. Service mode writes logs to numa.log via
env_logger pipe. Dashboard shows correct log path per OS.
2026-04-16 19:12:42 +03:00
Razvan Dimescu
f0a1dd7106 fix(dashboard): hide logs path on Windows (no log sink yet) 2026-04-16 19:01:34 +03:00
Razvan Dimescu
6789c321bc fix(windows): defer DNS redirect until port 53 is free
Probe port 53 after disabling Dnscache instead of assuming reboot is
needed. Skip DNS redirect when port is blocked (service does it on
first boot). Fix readiness probe: TCP connect to API port instead of
broken UDP send_to that always succeeded.
2026-04-16 18:35:09 +03:00
Razvan Dimescu
da40a8dbfc ci: fetch full history on Windows so build.rs embeds git SHA 2026-04-16 18:08:48 +03:00
Razvan Dimescu
65e65028a0 fix(windows): separate service lifecycle from install flow
service start/stop/restart/status now map to proper SCM operations
instead of re-running the full install/uninstall flow. On re-install,
stop the running service first so the binary can be overwritten.
2026-04-16 16:59:54 +03:00
Razvan Dimescu
d3eab73a31 fix: use sort_by_key to satisfy clippy unnecessary_sort_by 2026-04-16 16:13:15 +03:00
Razvan Dimescu
22ec684e48 Merge remote-tracking branch 'origin/main' into feat/windows-service
# Conflicts:
#	src/main.rs
2026-04-16 16:06:49 +03:00
Razvan Dimescu
aa040fd8a4 Merge pull request #111 from razvandimescu/fix/allowlist-input-focus
fix(dashboard): allowlist input erased by polling refresh
2026-04-16 15:27:02 +03:00
Razvan Dimescu
b69cc89d38 fix(dashboard): skip allowlist re-render while input has focus
The polling refresh replaced the entire allowlist panel innerHTML every
2 seconds, destroying the input field mid-typing. Users had to
paste-and-enter faster than the refresh interval — #106 reported this
as text "timing out and erasing."

Guard: skip renderAllowlist() when allowDomainInput has focus.
2026-04-16 15:12:00 +03:00
Razvan Dimescu
ebb801650e Merge pull request #110 from razvandimescu/feat/build-version
feat: embed git SHA in version string
2026-04-16 13:41:23 +03:00
Razvan Dimescu
30bb7365c9 refactor: robust git-describe parsing for pre-release tags
Switch to --long flag so format is always TAG-N-gSHA[-dirty], then
split from the right. Handles pre-release tags (v0.14.0-rc1) that
broke the previous left-split approach. Remove ineffective directory
watch on .git/refs/tags/. Trim comments.
2026-04-16 13:18:56 +03:00
Razvan Dimescu
0118ab0f44 feat: embed git SHA in version string via build.rs
Adds a build.rs that runs `git describe --tags --always --dirty` and
sets NUMA_BUILD_VERSION at compile time. A new `numa::version()` helper
returns the build version, falling back to CARGO_PKG_VERSION when git
is unavailable (source tarballs, Docker builds without .git).

Version strings:
  tagged release:      0.13.1
  commits ahead:       0.13.1+a87f907
  uncommitted changes: 0.13.1+a87f907-dirty
  no git:              0.13.1

Replaces all 6 inline env!("CARGO_PKG_VERSION") call sites with the
single version() function.
2026-04-16 13:02:25 +03:00
Razvan Dimescu
a87f907d20 Merge pull request #109 from razvandimescu/feat/dashboard-version
feat(dashboard): version in header, restructure footer
2026-04-16 11:29:54 +03:00
Razvan Dimescu
1c5e703330 fix(dashboard): collapse header on mobile (≤700px)
Hide tagline, version tag, and Phone Setup on narrow viewports so
the header stays single-row: logo + status dot + blocking toggle.
Reduces logo font-size from 1.8rem to 1.4rem on mobile.
2026-04-16 06:39:29 +03:00
Razvan Dimescu
cc635f2f73 feat(dashboard): show version in header, restructure footer
Closes #108.

- Add `version` field to /stats (from CARGO_PKG_VERSION).
- Show `v0.13.1` next to the Numa wordmark in the dashboard header.
- Restructure the footer into two semantic rows:
  Row 1 (paths): Config · Data · Logs (platform-detected)
  Row 2 (runtime): Upstream · DNSSEC · SRTT · GitHub
- Drop Mode from the footer (redundant with Upstream label).
- Show only the matching-platform log path instead of both
  macOS and Linux unconditionally.
2026-04-16 06:15:48 +03:00
Razvan Dimescu
7bb484ada3 refactor(windows): deduplicate after simplify review
- Drop the duplicate WINDOWS_SERVICE_NAME constant; call sites use the
  single source of truth at windows_service::SERVICE_NAME.
- windows_service_exe_path and service_config_path now compose from
  crate::data_dir() instead of re-parsing %PROGRAMDATA% locally.
- Factor the 6× sc.exe invocation boilerplate into a run_sc helper.
- Replace the 200ms try_recv polling loop in the service dispatcher
  with a recv_timeout wait — cuts shutdown latency and idle CPU.
- stop_service_scm/delete_service_scm now log warnings instead of
  silently swallowing failures, so unexpected errors are visible.
2026-04-15 23:48:09 +03:00
Razvan Dimescu
b610160cd1 feat(windows): run numa as a real SCM service, drop Run-key autostart
Hooks the service-dispatcher scaffolding from the previous commit to
actually serve DNS, and replaces the HKLM\…\Run login-time autostart
with a proper Windows service created via sc.exe.

**Refactor**
- Extract main.rs's inline server body (~500 lines) into `numa::serve::run`
  so both the interactive CLI entry and the service dispatcher drive the
  same startup/serve loop. main.rs is now a thin subcommand router.
- main.rs goes sync (no #[tokio::main]); each branch that needs async
  builds its own runtime and block_on's. Required so the --service path
  can hand off to SCM without fighting tokio for the entry thread.

**Windows service wrapper**
- `numa::windows_service::run_service` now builds a multi-thread tokio
  runtime on a dedicated thread and runs `serve::run` inside it. Stop/
  Shutdown from SCM aborts the wait loop and reports SERVICE_STOPPED.
- Config path resolves to `%PROGRAMDATA%\numa\numa.toml` when running
  under SCM (SYSTEM's cwd is System32, relative paths don't work).

**Install/uninstall**
- `install_windows` now copies numa.exe to a stable
  `%PROGRAMDATA%\numa\bin\numa.exe` and registers it via `sc create`
  with start=auto, obj=LocalSystem, and a failure policy of
  restart/5000/restart/5000/restart/10000. Starts the service
  immediately when no reboot is pending.
- `uninstall_windows` stops + deletes the service and removes the
  binary copy before restoring DNS.
- Drops the old `register_autostart` / `remove_autostart` helpers that
  wrote to `HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run` — that
  path runs at user login in the user's session with no stderr capture
  and no crash-restart policy, which is why we've been flying blind in
  every Windows debug session.

DNS-set bugs (netsh destructive static, IPv6 not touched, uninstall
secondary-drop) and file logging are orthogonal — tracked for follow-up.
2026-04-15 22:24:23 +03:00
Razvan Dimescu
cea4b0ef88 feat(windows): add windows-service crate + SCM dispatcher scaffold
Lets numa.exe act as a real Windows service registered with the SCM,
replacing the HKLM\...\Run login-time autostart that runs in the user
session without stderr capture.

- New `numa::windows_service` module (cfg(windows)) wraps Mullvad's
  `windows-service` crate: registers with SCM, reports Running, handles
  Stop/Shutdown, reports Stopped.
- `numa.exe --service` is the entry point SCM uses
  (`sc create … binPath="numa.exe --service"`); interactive invocations
  are unchanged.
- Dep is gated `[target.'cfg(windows)'.dependencies]` — zero impact on
  macOS/Linux builds or binary size.

Scaffold only. The service currently blocks on an mpsc channel until
Stop arrives; the actual serve loop will hook in once main.rs's inline
server body is extracted into `numa::serve(config_path)` in a follow-up.
This lets `sc start Numa` / `sc stop Numa` be verified end to end today.
2026-04-15 22:14:36 +03:00
Razvan Dimescu
4afc56a052 Merge main into feat/forwarding-array-upstream
Resolves conflict in src/ctx.rs — both sides added independent tokio
tests (forwarding fail-over on this branch, default-pool upstream path
on main from #103). Keep both.
2026-04-15 21:28:04 +03:00
Razvan Dimescu
43a5ca4bd5 Merge pull request #105 from razvandimescu/chore/audit-rustls-webpki
chore(deps): bump rustls-webpki to 0.103.12
2026-04-15 14:41:19 +03:00
Razvan Dimescu
b403671e11 chore(deps): bump rustls-webpki to 0.103.12
Patches RUSTSEC-2026-0098 (URI name constraints incorrectly accepted)
and RUSTSEC-2026-0099 (wildcard cert name constraints), both published
2026-04-14. Transitive via reqwest / rustls / hickory / quinn.
2026-04-15 14:27:17 +03:00
Razvan Dimescu
6f0144b237 Merge pull request #103 from razvandimescu/feat/upstream-log-label
feat: distinguish UPSTREAM vs FORWARD in logs and stats
2026-04-15 14:00:28 +03:00
Razvan Dimescu
fef43635d6 fix(ci): rustfmt import order and gate Upstream import for Windows 2026-04-15 04:11:27 +03:00
Razvan Dimescu
9a0d586b13 feat: accept array of upstreams in [[forwarding]]
Mirrors `[upstream] address` — `upstream` accepts string or array
of strings, builds an `UpstreamPool` and routes queries through
`forward_with_failover_raw` so SRTT ordering and failover apply to
matched `[[forwarding]]` rules the same way they do for the default
pool.

Single-string rules keep their current behavior (one-element pool,
equivalent single-upstream path). Empty array errors at config load.

Addresses item 1 of issue #102. Plan: docs/102_item1.md.
2026-04-15 04:03:38 +03:00
Razvan Dimescu
4bd08e206d feat(dashboard): hide zero-count path and transport rows 2026-04-14 21:25:11 +03:00
Razvan Dimescu
ebb2a5db39 refactor: simplify upstream-path test — reuse pool mutex, drop narrating comment 2026-04-14 18:26:45 +03:00
Razvan Dimescu
e0e0f50838 feat: distinguish UPSTREAM vs FORWARD in logs and stats
Queries matching a [[forwarding]] suffix rule now log as FORWARD;
queries resolved via the default [upstream] pool log as UPSTREAM.
Previously both paths shared the FORWARD label, making it impossible
to tell from logs whether a rule matched.

Adds QueryPath::Upstream, a queries.upstream stats counter exposed
via /stats, plus a matching dashboard filter, bar, and path tag.

Closes part of #102.
2026-04-14 18:18:32 +03:00
Razvan Dimescu
120ba5200e chore: bump version to 0.13.1 2026-04-14 13:31:35 +03:00
Razvan Dimescu
45046bcf6e Merge pull request #101 from razvandimescu/fix/forward-tls-upstream
fix: accept tls:// and https:// in [[forwarding]] upstreams
2026-04-14 13:09:58 +03:00
Razvan Dimescu
b4b939c78b fix: accept tls:// and https:// in [[forwarding]] upstreams
Config-level forwarding rules were parsed with the UDP-only
`parse_upstream_addr` helper, silently rejecting the DoT/DoH schemes
that the rest of the forwarding pipeline already supports.

Widen `ForwardingRule.upstream` from `SocketAddr` to `Upstream` so
config rules reuse the same parser as `[upstream].address` and
`fallback`. Demote `parse_upstream_addr` to `pub(crate)` to prevent
the same mistake recurring.

Closes #100.
2026-04-14 09:22:24 +03:00
Razvan Dimescu
9a85e271ec Merge pull request #99 from razvandimescu/fix/aur-llvm-libs
fix: add llvm-libs to AUR makedepends
2026-04-13 17:09:08 +03:00
Razvan Dimescu
7dc1a0686f fix: add llvm-libs to AUR makedepends
Fixes #97 — on minimal Arch installs, rustc fails with
"error while loading shared libraries: libLLVM.so" because
llvm-libs isn't pulled in transitively.
2026-04-13 15:58:52 +03:00
Razvan Dimescu
a02722cdf9 Merge pull request #98 from razvandimescu/docker-support
feat: Docker support with multi-arch GHCR images
2026-04-13 15:53:56 +03:00
Razvan Dimescu
3b77dcff61 feat: Docker support — multi-arch GHCR images on release
Add CI workflow to build linux/amd64 + linux/arm64 images and push to
ghcr.io/razvandimescu/numa on tag. Fix Dockerfile (missing benches/),
bake container-aware config (API + proxy bind 0.0.0.0), add Docker
section to README.
2026-04-13 15:48:29 +03:00
Razvan Dimescu
7cc110a0a1 ci: skip CI and AUR builds for blog/site-only changes
Add paths-ignore for site/, blog/, drafts/, *.md, and blog scripts
so content-only pushes don't trigger cargo builds or AUR publishes.
2026-04-13 15:02:19 +03:00
Razvan Dimescu
75fe625f39 blog: drop redundant Numa intro from opening paragraph 2026-04-13 14:48:34 +03:00
Razvan Dimescu
908d076d9b blog: pain-first opening, I-voice, forward-looking close
- Open with shared reqwest pain, not the tool name
- Switch "we" to "I" for personal voice (playbook: solo dev > corporate)
- Replace Unbound feature-gap excuses with what I'm exploring next
  (persistent SRTT, aggressive NSEC, adaptive hedge delays)
- Add context line linking hero cards to the recursive section
2026-04-13 14:37:24 +03:00
Razvan Dimescu
5381e65be4 Merge pull request #96 from razvandimescu/blog/fixing-doh-tail-latency
blog: fixing DoH tail latency post
2026-04-13 14:08:40 +03:00
Razvan Dimescu
6b0a30d004 blog: add fixing DoH tail latency post + blog infrastructure
New post on reqwest HTTP/2 window tuning and request hedging
(Dean & Barroso's "The Tail at Scale" applied to DNS forwarding).
Covers DoH forwarding p99 improvement and cold recursive
resolution from 2.3s to 538ms.

Also adds blog build infrastructure: index generation script,
draft preview server, hero metrics/before-after CSS, and
normalizes date format across existing posts.
2026-04-13 13:49:40 +03:00
Razvan Dimescu
169679bfe4 Merge pull request #95 from razvandimescu/fix/forwarding-precedes-special-use
fix: forwarding rules override special-use NXDOMAIN
2026-04-13 09:37:19 +03:00
Razvan Dimescu
d3f046da4c style: assert loopback addr in subdomain test, trim verbose comment 2026-04-13 08:10:26 +03:00
Razvan Dimescu
0bdde40f40 test: verify forwarded response content from mock upstream 2026-04-13 08:07:58 +03:00
Razvan Dimescu
155c1c4da0 test: full-pipeline coverage for every resolve_query step
Test each pipeline stage in isolation through resolve_query:
- override takes precedence over all other paths
- localhost and *.localhost resolve to loopback
- local zone returns configured records
- .tld proxy resolves registered services to loopback
- blocklist sinkholes to 0.0.0.0
- cache hit returns stored response without upstream
2026-04-13 08:04:59 +03:00
Razvan Dimescu
b40004fe5e refactor: extract shared test infrastructure into testutil module
- test_ctx(): single ServerCtx builder, replaces 3 copies (ctx/api/dot)
- mock_upstream(): canned DNS response server for forwarding tests
- blackhole_upstream(): unresponsive socket for timeout tests
- Removes ~100 lines of duplicated 30-field struct literals
2026-04-13 07:56:47 +03:00
Razvan Dimescu
b8ddc16027 refactor: return QueryPath from resolve_query, add mock upstream to tests
resolve_query now returns (BytePacketBuffer, QueryPath) so callers
and tests can inspect the resolution path without reading the query
log. Production call sites (UDP, DoT, DoH) destructure and ignore it.

The forwarding test now uses a mock UDP upstream that replies with a
canned response, asserting QueryPath::Forwarded instead of != Local.
2026-04-13 07:51:14 +03:00
Razvan Dimescu
48f67be2f1 refactor: deduplicate test_ctx by delegating to test_ctx_with_forwarding 2026-04-13 07:39:55 +03:00
Razvan Dimescu
ca00846393 fix: forwarding rules override special-use NXDOMAIN for private PTR zones
Explicit [[forwarding]] rules now take precedence over the RFC 6303
special-use domain intercept. Previously, PTR queries for private
ranges (e.g. 168.192.in-addr.arpa) always returned local NXDOMAIN
even when a forwarding rule pointed them at a corporate DNS server.

Add full-pipeline resolve_query test harness (test_ctx + resolve_in_test)
and two tests covering both the default behavior and the override.

Closes #94
2026-04-13 07:36:53 +03:00
Razvan Dimescu
4d4e48bbd6 chore: bump version to 0.13.0 2026-04-13 01:05:20 +03:00
Razvan Dimescu
724c4a6017 Merge pull request #91 from razvandimescu/docs/readme-update
docs: update README with v0.13.0 features
2026-04-13 01:03:16 +03:00
Razvan Dimescu
2b29a44ee0 docs: remove unfair NextDNS comparison from performance section
Comparing local cache (0.8ms) vs a remote service (37ms) measures
network latency, not resolver quality. Any local resolver would
show the same advantage. Replaced with AdGuard Home comparison
which is a fair local-to-local benchmark.
2026-04-13 01:02:10 +03:00
Razvan Dimescu
588e5226fd Merge pull request #92 from razvandimescu/bench/vs-adguard
bench: add --vs-adguard comparison mode
2026-04-13 01:00:20 +03:00
Razvan Dimescu
501902d569 bench: add --vs-adguard mode for Numa vs AdGuard Home comparison
AdGuard Home on port 5457, both forwarding via DoH. Cached queries
tied at 0.1ms. On degraded networks hedging hurts p99 (28ms vs 10ms
without) — both requests pay the same high RTT with no random spikes
to rescue. On clean networks hedging wins.
2026-04-13 00:56:58 +03:00
Razvan Dimescu
77d2c8bbcd docs: update README comparison table, performance, and roadmap
- Comparison table: add DoH/DoT upstream, DoH server, request hedging,
  serve-stale + prefetch, conditional forwarding rows
- Performance: update with current benchmark numbers (0.1ms cached,
  47x NextDNS, p99 -28% vs Unbound)
- Roadmap: add hedging, serve-stale, conditional forwarding, DoT upstream
- Fix broken benchmarks link (bench/ → benches/)
2026-04-13 00:18:52 +03:00
Razvan Dimescu
274338e7f9 Merge pull request #88 from razvandimescu/fix/doh-loopback-san
fix: DoH endpoint accepts loopback, TLS cert includes IP SANs
2026-04-13 00:03:30 +03:00
Razvan Dimescu
305935ed98 style: rustfmt strip_port 2026-04-12 23:59:51 +03:00
Razvan Dimescu
bd505813b6 test: verify TLS cert SANs (wildcard, services, loopback, localhost, bare TLD)
Parse the generated DER cert with x509-parser to assert the exact SAN
set, catching silent try_into() failures that a params-level test
would miss.
2026-04-12 23:54:55 +03:00
Razvan Dimescu
115a55b199 fix: bracketed IPv6, localhost SAN, split host-check helpers
- is_doh_host split into strip_port + is_loopback_host + is_tld_match
- strip_port handles bracketed IPv6 ([::1]:443) and rejects bare IPv6
- Add [::1] to accepted loopback hosts, add localhost DNS SAN to cert
- Remove dead sans.is_empty() guard (loopback IPs always present)
2026-04-12 23:54:26 +03:00
Razvan Dimescu
3665deb56b fix: accept loopback addresses for DoH and add IP SANs to TLS cert
The DoH endpoint rejected requests with Host: 127.0.0.1/::1/localhost,
and the generated TLS cert had no IP SANs — so browsers couldn't use
https://127.0.0.1/dns-query even with the CA trusted.

- is_doh_host now accepts 127.0.0.1, ::1, localhost (with optional port)
- TLS cert includes 127.0.0.1 and ::1 IP SANs, plus bare TLD DNS SAN

Closes #87
2026-04-12 23:54:26 +03:00
Razvan Dimescu
c074d728e9 Merge pull request #90 from razvandimescu/feat/wire-forwarding-hedging
feat: transport protocol tracking with dashboard visualization
2026-04-12 23:38:57 +03:00
Razvan Dimescu
2101dfcf17 feat: transport protocol tracking (UDP/TCP/DoT/DoH) with dashboard visualization
Thread Transport enum through resolve pipeline, record per-query
transport in stats and query log. Dashboard gets bar chart panel
with encryption %, transport column in query log, and filter dropdown.
2026-04-12 22:14:26 +03:00
Razvan Dimescu
27dc53aebb Merge pull request #85 from razvandimescu/feat/wire-forwarding-hedging
feat: wire-level forwarding, cache, and request hedging
2026-04-12 22:02:45 +03:00
Razvan Dimescu
8085c10687 docs: document hedge_ms, tls:// upstream, update max_entries default in numa.toml 2026-04-12 21:37:59 +03:00
Razvan Dimescu
02e1449a45 feat: enable request hedging for all upstream protocols
Hedging was DoH-only (hyper dispatch spike mitigation). Now applies to
UDP (rescues packet loss) and DoT (rescues TLS handshake stalls) too.
Same-upstream hedging: fires a second independent request after hedge_ms
delay. First response wins. Disable with hedge_ms = 0.
2026-04-12 21:34:47 +03:00
Razvan Dimescu
50828c411a fix: cold benchmark uses 1 round per domain for genuine cold measurements
With ROUNDS=10, only the first query per domain was truly cold — the
other 9 hit cached NS delegations at <1ms, diluting the median to
0.4ms. Now cold mode uses 1 round so every sample is a real cold
resolve. Also extracted compare_two_rounds to support per-mode rounds.
2026-04-12 21:00:24 +03:00
Razvan Dimescu
5184891985 fix: cold benchmark cache-busting with PID prefix and flush
Re-runs of --vs-unbound-cold were hitting stale cache entries from
prior runs. The static COUNTER reset to 0 each process, generating
the same c0.example.com subdomains. With the 1-hour stale window,
entries from 10 minutes ago served as stale hits.

Fix: prefix with PID (r{pid}-c{n}.domain) and flush Numa's cache
before cold benchmarks.
2026-04-12 20:50:04 +03:00
Razvan Dimescu
6d9ee14ea6 refactor: unify warm_stale/warm_domain, remove raw_wire alloc, add Freshness enum
- Extract refresh_entry in ctx.rs — warm_domain in main.rs now delegates
  to it instead of duplicating the resolve+cache logic (~40 lines removed)
- Eliminate unconditional .to_vec() of raw wire on every UDP/DoT query —
  pass &buffer.buf[..len] directly (zero-cost for cache hits)
- Replace bare bool stale flag with Freshness enum (Fresh/NearExpiry/Stale)
  making the three states self-documenting at every call site
2026-04-12 19:56:42 +03:00
Razvan Dimescu
3c49b0e65d fix: deduplicate background refresh with per-domain guard
Multiple stale queries for the same domain now spawn only one background
refresh. A HashSet<(String, QueryType)> on ServerCtx tracks in-flight
refreshes; subsequent stale hits for the same key skip the spawn.
2026-04-12 19:49:23 +03:00
Razvan Dimescu
8ef95383a2 feat: prefetch at <10% TTL remaining, add stale behavior tests
Entries with <10% TTL remaining are now marked stale on lookup,
triggering a background refresh before they expire. Combined with
the serve-stale + background refresh from the previous commit, this
means entries are proactively refreshed — matching Unbound's prefetch
behavior.
2026-04-12 19:46:14 +03:00
Razvan Dimescu
571ce2f013 feat: background refresh on stale cache hit (RFC 8767 revalidation)
When a cached entry is expired but within the 1-hour stale window,
serve it immediately with TTL=1 AND spawn a background re-resolve.
The next query gets a fresh entry instead of another stale serve.

Without this, stale entries were served repeatedly for up to an hour
with no refresh — effectively ignoring TTL.
2026-04-12 19:42:56 +03:00
Razvan Dimescu
043a7e1ba5 feat: raise cache default to 100K entries, evict stalest instead of dropping
The 10K cap was too conservative — the blocklist alone holds 400K domains.
At ~100 bytes per wire entry, 100K entries is ~10MB.

When the cache is full and evict_expired doesn't free enough slots,
evict_stalest removes the entry with the least remaining TTL instead of
silently discarding the new insert.
2026-04-12 19:23:28 +03:00
Razvan Dimescu
05d5a5145f refactor: remove unused extract_question and read_wire_qname from wire.rs 2026-04-12 18:46:03 +03:00
Razvan Dimescu
15058aea83 bench: add --vs-nextdns, --vs-unbound-cold modes with mode validation
- --vs-nextdns: Numa local cache vs NextDNS cloud (45.90.28.0)
- --vs-unbound-cold: unique random subdomains, no record cache hits
- check_numa_mode validates forward/recursive mode before running
- numa-bench-recursive.toml config for cold benchmarks
2026-04-12 18:41:09 +03:00
Razvan Dimescu
628ed00074 refactor: extract cache_and_parse, remove dead truncation log, restore TCP_TIMEOUT to 400ms 2026-04-12 18:40:46 +03:00
Razvan Dimescu
85cff052a4 fix: restore TCP_TIMEOUT to 400ms (test race was the real issue) 2026-04-12 18:40:46 +03:00
Razvan Dimescu
67b472fea7 fix: serialize tests that share global UDP_DISABLED state
The tcp_only_iterative_resolution, tcp_fallback_resolves_when_udp_blocked,
tcp_fallback_handles_nxdomain, and udp_auto_disable_resets tests all mutate
global UDP_DISABLED / UDP_FAILURES atomics. Under cargo test parallelism,
udp_auto_disable_resets would reset the flag mid-flight causing other tests
to attempt UDP against TCP-only mock servers and time out.

Fix: static Mutex serializes tests that depend on global UDP state.
Also: tcp_only_iterative_resolution now calls forward_tcp directly,
removing its dependence on the flag entirely.
2026-04-12 18:40:46 +03:00
Razvan Dimescu
700cca9cb6 style: rustfmt warm_domain 2026-04-12 18:40:46 +03:00
Razvan Dimescu
f705f8c49f fix: bump TCP_TIMEOUT to 800ms to fix flaky CI test 2026-04-12 18:40:46 +03:00
Razvan Dimescu
17a1a6ddba refactor: remove forward_with_failover duplication, fix warm-branch hedge bug
- Remove forward_with_failover (parsed): warm_domain now uses _raw + insert_wire
- forward_udp delegates to forward_udp_raw (single UDP socket implementation)
- forward_query uses unified _raw path for all protocols
- Fix send_query_hedged warm branch: bare select! dropped secondary on primary
  error instead of waiting for it — now drains both futures like the cold branch
- Remove pointless raw_len = len rename
2026-04-12 18:40:46 +03:00
Razvan Dimescu
72b540a44a feat: wire-level cache, serve-stale, raw wire passthrough
- Cache stores raw DNS wire bytes + TTL offsets (2.4x memory reduction)
- Serve-stale (RFC 8767): expired entries returned with TTL=1 for 1hr
- handle_query captures raw_len from recv_from for zero-copy forwarding
- resolve_query accepts raw wire bytes, forwards without re-serializing
- wire.rs: TTL offset scanner, ID/TTL patching, question extraction
- 52 wire tests + 16 cache regression tests
2026-04-12 18:40:46 +03:00
Razvan Dimescu
c1b651aa63 chore: remove obsolete bash benchmark script 2026-04-12 18:40:46 +03:00
Razvan Dimescu
5d9a3a809b feat: DoT client, recursive optimization, bench refactor
- Add DoT forwarding client (tls://IP#hostname upstream config)
- Recursive: cache NS delegations, serve-stale (RFC 8767), parallel
  NS queries on cold, no TCP fallback on individual UDP timeouts,
  400ms NS/TCP timeout (down from 800/1500ms)
- Reduce recursive p99 from 2367ms to 402ms (vs Unbound's 148ms)
- Refactor benchmark suite: generic compare_two engine, delete
  one-off diagnostics (1969 → 750 lines)
- Code cleanup: forward_query delegates to _raw, Option<String>
  for tls_name, saturating_sub for ns_idx
2026-04-12 18:40:46 +03:00
Razvan Dimescu
7efac85836 feat: wire-level forwarding, cache, request hedging, and DoH keepalive
Wire-level forwarding path skips DnsPacket parse/serialize on the hot
path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches
ID + TTLs in-place on lookup instead of cloning parsed packets.

Request hedging (Dean & Barroso "Tail at Scale") fires a second
parallel request after a configurable delay (default 10ms) when
the primary upstream stalls. DoH keepalive loop prevents idle
HTTP/2 + TLS connection teardown.

Recursive resolver now hedges across multiple NS addresses and
caches NS delegation records to skip TLD re-queries.

Integration test harness polls /blocking/stats instead of fixed
sleep, eliminating the blocklist-download race condition.
2026-04-12 18:39:48 +03:00
Razvan Dimescu
4f46550283 Merge pull request #89 from razvandimescu/feat/dot-client
feat: DoT (DNS over TLS) client upstream
2026-04-12 18:39:17 +03:00
Razvan Dimescu
05baad0cc0 feat: DoT (DNS over TLS) client upstream
Adds tls:// upstream support for forwarding queries over DNS-over-TLS
(RFC 7858). Parses tls://IP:PORT#hostname format, with default port 853.

- New Upstream::Dot variant with TLS connector
- forward_dot: length-prefixed DNS over TLS stream
- build_dot_connector: system root CAs via webpki-roots
- parse_upstream handles tls:// prefix

Example config:
  address = ["tls://9.9.9.9#dns.quad9.net"]
2026-04-12 18:35:06 +03:00
Razvan Dimescu
7047767dc2 feat: per-suffix conditional forwarding rules (#82) (#84)
* feat: per-suffix conditional forwarding rules in numa.toml (#82)

Adds a `[[forwarding]]` config section so users can explicitly route
domain suffixes to specific upstreams. Config-declared rules take
precedence over auto-discovered rules (macOS scutil, Linux search
domains) via first-match semantics.

Example — the reporter's reverse-DNS case:

  [[forwarding]]
  suffix = "168.192.in-addr.arpa"
  upstream = "100.90.1.63:5361"

Bare IPs default to port 53. IPv6 is supported via
parse_upstream_addr. ForwardingRule::new() constructor replaces
direct struct-literal construction, and make_rule() now delegates
to parse_upstream_addr to fix a latent IPv6 parsing bug.

* feat: accept suffix as string or array in [[forwarding]] rules

Reuses existing string_or_vec deserializer so users can write:
  suffix = ["168.192.in-addr.arpa", "onsite"]
instead of repeating [[forwarding]] blocks per suffix.

* style: rustfmt

* refactor: drop config_count from merge_forwarding_rules return

Log config rules directly from config.forwarding before merging,
keeping the merge API clean of logging concerns.
2026-04-12 06:12:08 +03:00
Razvan Dimescu
22bebb85a0 fix: config path advisory ignores XDG file on interactive root (#81) (#83)
Port-53 and TLS-data-dir advisories told users to create
~/.config/numa/numa.toml, but config_dir() routed root to
/var/lib/numa/ and load_config never consulted the XDG path, so
the file the user created was silently ignored.

New suggested_config_path() helper prefers $HOME/.config/numa/
when HOME is set (and isn't "/" or empty), with config_dir() as
lazy fallback. Used by both advisories and by load_config as an
additional candidate, so the advised path is the path numa
actually reads. Runtime state (services.json, TLS CA) stays in
FHS — config_dir()/data_dir() are intentionally unchanged to
keep continuity with the installed daemon.

End-to-end replication + regression check in
tests/docker/issue-81.sh: four scenarios (replication and
existing-install, each against main and fix), all matching
expectations.
2026-04-12 02:17:33 +03:00
Razvan Dimescu
289f2b973b chore: remove built blog HTML from tracking (built by CI)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:10:13 +03:00
Razvan Dimescu
fb4cbe0b2a chore: update DoT blog post — mark DoH server as shipped in v0.12.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:08:09 +03:00
Razvan Dimescu
2de1bc2efc chore: bump version to 0.12.0 2026-04-11 12:15:40 +03:00
Razvan Dimescu
156b68de87 fix: replace unscannable QR art with placeholder in blog post (#80)
The Unicode block-character QR code in the DoT blog post can't be
scanned by phone cameras due to HTML font metrics distorting the grid.
Replace with a bordered placeholder box — the dashboard screenshot
already shows a working QR.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 04:17:46 +03:00
Razvan Dimescu
7d6b0ed568 feat: DoH server endpoint + DoT enabled by default (#79)
* chore: document multi-forwarder and cache warming in config and README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: DNS-over-HTTPS server endpoint (RFC 8484)

Serve DoH at POST /dns-query on the existing HTTPS proxy (port 443).
Automatically enabled when proxy TLS is active — no config needed.
Also fix zone map priority so local zones override RFC 6762 .local
special-use handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove GoatCounter analytics from site

GoatCounter domains (goatcounter.com, gc.zgo.at) are blocked by
Hagezi Pro, which is Numa's default blocklist. A DNS privacy tool
should not embed analytics that its own resolver blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: enable DoT listener by default

DoT now starts automatically with `sudo numa`, matching the proxy and
DoH which are already on by default. The self-signed CA infrastructure
is shared with the proxy, so there is no additional setup. This makes
`numa setup-phone` work out of the box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 04:06:17 +03:00
Razvan Dimescu
7770129589 feat: cache warming — proactive DNS resolution for configured domains (#78)
Resolves A + AAAA at startup for domains listed in [cache] warm,
then re-resolves before TTL expiry (at 75% elapsed). Keeps critical
domains always hot in cache with zero client-visible latency.

Closes #34 (item 4)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 01:14:04 +03:00
Razvan Dimescu
8abcd91f95 feat: multi-forwarder with SRTT-based failover (#77)
* feat: multi-forwarder with SRTT-based failover

address accepts string or array, with optional per-server port override.
New fallback pool tried only when all primaries fail. Sequential failover
with SRTT ranking ensures fastest upstream is tried first.

Closes #34 (items 1, 2, 3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: simplify failover candidate list and deduplicate recursive pool

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract maybe_update_primary for testable upstream re-detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: rustfmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:26:58 +03:00
Razvan Dimescu
a96b84fdeb ci: use pandoc/actions/setup instead of apt-get (#76)
apt-get install pandoc took ~27 minutes due to apt index refresh.
The prebuilt binary action completes in seconds.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 23:16:59 +03:00
Razvan Dimescu
23ff3ce455 chore: blog full QR output + dashboard screenshot, hero script phone setup scene (#75)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 22:34:41 +03:00
Razvan Dimescu
2c20c56421 feat: mobile setup — QR onboarding, Wi-Fi scoped mobileconfig (#73)
* fix: scope mobileconfig DNS to Wi-Fi only via OnDemandRules

Without OnDemandRules, iOS applies the DoT profile globally —
cellular DNS breaks when the phone leaves the LAN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: phone setup QR code in dashboard header

- Add /qr endpoint serving SVG (uses existing qrcode crate, svg feature)
- Header popover: QR on desktop, direct download link on mobile viewports
- Only visible when [mobile] enabled = true in config
- Expose mobile.enabled and mobile.port in /stats response
- Lazy-load QR on first click, dismiss on outside click

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add Cache-Control to /qr, re-fetch QR on each popover open

Cache-Control: no-store prevents stale QR after LAN IP change.
Remove qrLoaded flag so the QR always reflects the current IP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: rustfmt serve_qr response tuple

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add iOS install steps to phone setup popover

iOS shows "Profile Downloaded" with no guidance. The popover
now includes the 3-step install flow including the buried
Certificate Trust Settings toggle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 22:21:51 +03:00
Razvan Dimescu
921ed68d54 fix: allowlist parent domain unblocks subdomains (#74)
* fix: allowlist parent domain unblocks subdomains in blocklist

The allowlist walk-up was interleaved with the blocklist walk-up,
so an exact blocklist match on www.example.com short-circuited
before reaching example.com in the allowlist. Now allowlist is
checked at all parent levels before consulting the blocklist.

Deduplicate is_blocked/check via find_in_set helper; is_blocked
delegates to check. Adds 7 new blocklist tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: rustfmt blocklist tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: zero-alloc is_blocked hot path, normalize trailing dots

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: rustfmt add_to_allowlist

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract normalize() for domain lowering + dot stripping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:43:40 +03:00
Razvan Dimescu
8da03b1b8c chore: GoatCounter analytics, README v0.11.0, DoT blog post (#72)
* chore: GoatCounter analytics, README for v0.11.0, DoT blog post

- Add GoatCounter script to site pages (cookie-free, no consent needed)
- Update README: setup-phone section, DoT blog link, roadmap checkbox
- Add DoT blog post source and SVG assets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: site navbar, updated roadmap, DoT blog listing, spec updates

- Add top nav bar to landing page (wordmark + links, responsive)
- Add DoT blog post entry to blog index
- Update roadmap: phases 8-13 (hostile-network, Windows, DoT shipped)
- Update specs: listeners section, dependency description, port list
- Blog: hostile-network SVG in DNSSEC post, text-transform fix
- Blog template: wordmark text-transform fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:23:11 +03:00
Razvan Dimescu
652fca5b80 chore: bump version to 0.11.0 2026-04-10 19:10:58 +03:00
Razvan Dimescu
de15b32325 feat: numa setup-phone — QR-based mobile DoT onboarding (#38)
* feat: numa setup-phone — QR-based mobile DoT onboarding

Adds a CLI subcommand that generates a one-time mobileconfig profile
containing both the Numa local CA (as a com.apple.security.root payload)
and the DoT DNS settings, then serves it via a temporary HTTP server
and prints a scannable QR code in the terminal.

Flow:
  1. User runs `numa setup-phone` (no sudo needed)
  2. Detects current LAN IP, reads CA from /usr/local/var/numa/ca.pem
  3. Builds combined mobileconfig (CA trust + DoT)
  4. Renders QR code with qrcode crate (Unicode block characters)
  5. Serves the profile on port 8765, stays open until Ctrl+C
  6. Counts successful downloads (multi-device households)

Important caveat documented in instructions: even with the CA bundled
in the profile, iOS still requires the user to manually enable trust
in Settings → General → About → Certificate Trust Settings. Verified
on a real iPhone.

Stable PayloadIdentifiers/UUIDs ensure re-running replaces the
existing profile on iOS rather than accumulating duplicates.

- New module: src/setup_phone.rs (~270 lines)
- New CLI subcommand: `numa setup-phone`
- New dependency: qrcode = "0.14" (default-features = false)
- tokio "signal" feature added for Ctrl+C handling
- 3 unit tests: PEM stripping, mobileconfig generation, QR rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mobile API, enriched /health, mobileconfig module

Adds a persistent read-only HTTP listener (default port 8765, LAN-bound)
serving a dedicated subset of Numa's API for iOS/Android companion apps
and as a replacement for the one-shot server setup_phone used to spin up:

  GET /health           — enriched JSON with version, hostname, LAN IP,
                          SNI, DoT config, mobile API port, CA
                          fingerprint, features (shared handler with
                          the main API on port 5380)
  GET /ca.pem           — public CA certificate (shared handler)
  GET /mobileconfig     — full iOS profile (CA trust + DNS settings
                          pinned to current LAN IP)
  GET /ca.mobileconfig  — CA-only iOS profile (trust anchor without
                          DNS override — for the iOS companion app's
                          programmatic DNS flow via NEDNSSettingsManager)

All routes are idempotent GETs. The mobile API never serves the
state-mutating routes that live on the main API (overrides, blocking
toggle, service CRUD, cache flush), so it is safe to expose on the LAN
regardless of the main API's bind address. The CA private key is never
served by any route.

Opt-in via `[mobile] enabled = true`. Default is false so new installs
do not silently expose a LAN listener after upgrading; our committed
numa.toml template enables it explicitly for spike testing.

New modules:

- src/mobileconfig.rs — ProfileMode::{Full, CaOnly} enum with plist
  builder lifted from setup_phone.rs. Full and CaOnly share the CA
  payload UUID (same trust anchor) but have distinct top-level UUIDs
  so they coexist as separate installable profiles on iOS.

- src/health.rs — HealthMeta cached metadata built once at startup
  from config + CA fingerprint (SHA-256 of the PEM via ring), and the
  HealthResponse JSON shape shared between the main and mobile APIs.

- src/mobile_api.rs — axum Router for the persistent listener. Reuses
  api::health and api::serve_ca from the main API; owns the two
  mobileconfig handlers.

Modified:

- src/api.rs — health() returns the enriched HealthResponse, now pub.
  serve_ca is now pub so mobile_api can reuse it.
- src/config.rs — MobileConfig section (enabled, port, bind_addr).
- src/ctx.rs — health_meta: HealthMeta field on ServerCtx.
- src/main.rs — builds HealthMeta at startup, spawns mobile API
  listener if enabled.
- src/lan.rs — build_announcement takes &HealthMeta and writes
  enriched TXT records (version, api_port, proto, dot_port, ca_fp).
  SRV port now reports the mobile API port; peer discovery still
  reads TXT `services=` so this is backwards compatible. Always
  announces even when no .numa services are registered, so the iOS
  companion app can discover Numa via mDNS regardless of service
  state.
- src/setup_phone.rs — reduced from 267 to 100 lines. The CLI is now
  a thin QR wrapper over the persistent /mobileconfig endpoint; the
  hand-rolled one-shot HTTP server (accept_loop, RUST_OK_HEADERS,
  RUST_NOT_FOUND, download counter) is gone.
- src/dot.rs — test fixture updated with HealthMeta::test_fixture().
- numa.toml — commented [mobile] section, enabled = true for spike.

Tests: 136 unit tests passing (5 new in mobileconfig, 3 new in health).
cargo clippy clean. Integration sanity check: curl'd /health, /ca.pem,
/mobileconfig, /ca.mobileconfig against a running numa — all return
200 with correct content types and valid response bodies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: setup-phone probe, unknown command error, query source in dashboard

- setup-phone now probes the mobile API before printing the QR code
  and shows an actionable error if [mobile] is not enabled
- Unknown CLI subcommands print an error instead of silently
  attempting to start a full server
- Dashboard query log shows source IP under timestamp (localhost
  for loopback, full IP for LAN devices) with full addr on hover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:08:56 +03:00
Razvan Dimescu
6f961c5ec2 fix: push only specific tag when releaseing a new version 2026-04-10 09:03:03 +03:00
71 changed files with 13049 additions and 1195 deletions

View File

@@ -3,8 +3,22 @@ name: CI
on:
push:
branches: [main]
paths-ignore:
- 'site/**'
- 'blog/**'
- 'drafts/**'
- '*.md'
- 'scripts/serve-site.sh'
- 'scripts/generate-blog-index.sh'
pull_request:
branches: [main]
paths-ignore:
- 'site/**'
- 'blog/**'
- 'drafts/**'
- '*.md'
- 'scripts/serve-site.sh'
- 'scripts/generate-blog-index.sh'
env:
CARGO_TERM_COLOR: always
@@ -42,6 +56,8 @@ jobs:
runs-on: windows-latest
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: build
@@ -55,3 +71,76 @@ jobs:
with:
name: numa-windows-x86_64
path: target/debug/numa.exe
integration-linux:
needs: [check]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: build
run: cargo build --release
- name: install / verify / re-install / uninstall
run: |
sudo ./target/release/numa install
sleep 2
curl -sf http://127.0.0.1:5380/health
dig @127.0.0.1 example.com +short +timeout=5 | grep -q '.'
user=$(ps -o user= -p "$(systemctl show -p MainPID --value numa)" | tr -d ' ')
echo "numa running as: $user"
test "$user" != "root"
sudo ./target/release/numa install
sleep 2
curl -sf http://127.0.0.1:5380/health
sudo ./target/release/numa uninstall
sleep 1
! curl -sf http://127.0.0.1:5380/health 2>/dev/null
- name: diagnostics on failure
if: failure()
run: |
echo "=== systemctl status numa ==="
sudo systemctl status numa --no-pager -l || true
echo "=== journalctl -u numa (last 200) ==="
sudo journalctl -u numa --no-pager -n 200 || true
echo "=== ss -tulnp on 53/80/443/853/5380 ==="
sudo ss -tulnp 2>/dev/null | grep -E ':(53|80|443|853|5380)\b' || true
echo "=== systemctl is-active systemd-resolved ==="
systemctl is-active systemd-resolved || true
- name: cleanup
if: always()
run: |
sudo ./target/release/numa uninstall 2>/dev/null || true
# systemd-resolved has a ~40s DNS reconfiguration stall after
# restart (systemd issue #22521) that breaks the runner agent's
# connection to GitHub. Bridge it by replacing the stub-resolv
# symlink with a direct upstream — DNS works instantly and the
# runner can phone home for post-job steps.
sudo rm -f /etc/resolv.conf
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf > /dev/null
getent hosts github.com >/dev/null
integration-macos:
needs: [check-macos]
runs-on: macos-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: build
run: cargo build --release
- name: install / verify / re-install / uninstall
run: |
sudo ./target/release/numa install
sleep 2
curl -sf http://127.0.0.1:5380/health
dig @127.0.0.1 example.com +short +timeout=5 | grep -q '.'
sudo ./target/release/numa install
sleep 2
curl -sf http://127.0.0.1:5380/health
sudo ./target/release/numa uninstall
sleep 1
! curl -sf http://127.0.0.1:5380/health 2>/dev/null
- name: cleanup
if: always()
run: sudo ./target/release/numa uninstall 2>/dev/null || true

45
.github/workflows/docker.yml vendored Normal file
View File

@@ -0,0 +1,45 @@
name: Docker
on:
push:
tags:
- 'v*'
permissions:
contents: read
packages: write
jobs:
docker:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: docker/setup-qemu-action@v3
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/metadata-action@v5
id: meta
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=raw,value=latest
- uses: docker/build-push-action@v6
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

View File

@@ -23,6 +23,13 @@ name: Publish - Arch Linux AUR Package
on:
push:
branches: [main]
paths-ignore:
- 'site/**'
- 'blog/**'
- 'drafts/**'
- '*.md'
- 'scripts/serve-site.sh'
- 'scripts/generate-blog-index.sh'
workflow_dispatch:
permissions:

View File

@@ -32,7 +32,7 @@ jobs:
- name: Checkout
uses: actions/checkout@v6
- name: Install pandoc
run: sudo apt-get install -y pandoc
uses: pandoc/actions/setup@v1
- name: Generate blog HTML
run: make blog
- name: Setup Pages

3
.gitignore vendored
View File

@@ -3,3 +3,6 @@
CLAUDE.md
docs/
site/blog/posts/
ios/
drafts/
site/blog/index.html

917
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
[package]
name = "numa"
version = "0.10.3"
version = "0.14.1"
authors = ["razvandimescu <razvan@dimescu.com>"]
edition = "2021"
description = "Portable DNS resolver in Rust — .numa local domains, ad blocking, developer overrides, DNS-over-HTTPS"
@@ -10,7 +10,7 @@ keywords = ["dns", "dns-server", "ad-blocking", "reverse-proxy", "developer-tool
categories = ["network-programming", "development-tools"]
[dependencies]
tokio = { version = "1", features = ["rt-multi-thread", "macros", "net", "time", "sync"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros", "net", "time", "sync", "signal"] }
axum = "0.8"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
@@ -29,12 +29,25 @@ rustls = "0.23"
tokio-rustls = "0.26"
arc-swap = "1"
ring = "0.17"
odoh-rs = "1"
psl = "2"
# rand_core 0.9 matches the version odoh-rs (via hpke 0.13) depends on, so we
# share one RngCore trait and OsRng impl across the dep tree.
rand_core = { version = "0.9", features = ["os_rng"] }
rustls-pemfile = "2.2.0"
qrcode = { version = "0.14", default-features = false, features = ["svg"] }
webpki-roots = "1"
[target.'cfg(windows)'.dependencies]
windows-service = "0.7"
[dev-dependencies]
criterion = { version = "0.8", features = ["html_reports"] }
tower = { version = "0.5", features = ["util"] }
http = "1"
hickory-resolver = { version = "0.25", features = ["https-ring", "webpki-roots"] }
hickory-proto = "0.25"
x509-parser = "0.18"
[[bench]]
name = "hot_path"
@@ -47,3 +60,7 @@ harness = false
[[bench]]
name = "dnssec"
harness = false
[[bench]]
name = "recursive_compare"
harness = false

View File

@@ -6,6 +6,7 @@ RUN mkdir src && echo 'fn main() {}' > src/main.rs && echo '' > src/lib.rs
RUN cargo build --release 2>/dev/null || true
RUN rm -rf src
COPY src/ src/
COPY benches/ benches/
COPY site/ site/
COPY numa.toml com.numa.dns.plist numa.service ./
RUN touch src/main.rs src/lib.rs
@@ -13,5 +14,6 @@ RUN cargo build --release
FROM alpine:3.23
COPY --from=builder /app/target/release/numa /usr/local/bin/numa
RUN mkdir -p /root/.config/numa && printf '[server]\napi_bind_addr = "0.0.0.0"\n\n[proxy]\nenabled = true\nbind_addr = "0.0.0.0"\n' > /root/.config/numa/numa.toml
EXPOSE 53/udp 80/tcp 443/tcp 853/tcp 5380/tcp
ENTRYPOINT ["numa"]

View File

@@ -32,6 +32,19 @@ blog:
pandoc "$$f" --template=site/blog-template.html -o "site/blog/posts/$$name.html"; \
echo " $$f → site/blog/posts/$$name.html"; \
done
@scripts/generate-blog-index.sh
blog-drafts: blog
@if [ -d drafts ] && ls drafts/*.md >/dev/null 2>&1; then \
for f in drafts/*.md; do \
name=$$(basename "$$f" .md); \
pandoc "$$f" --template=site/blog-template.html -o "site/blog/posts/$$name.html"; \
echo " $$f → site/blog/posts/$$name.html (draft)"; \
done; \
BLOG_INCLUDE_DRAFTS=1 scripts/generate-blog-index.sh; \
else \
echo " No drafts found"; \
fi
release:
ifndef VERSION

View File

@@ -9,7 +9,7 @@ url="https://github.com/razvandimescu/numa"
license=('MIT')
options=('!lto')
depends=('gcc-libs' 'glibc')
makedepends=('cargo' 'git')
makedepends=('cargo' 'git' 'llvm-libs')
provides=("$_pkgname")
conflicts=("$_pkgname")
backup=('etc/numa.toml')

View File

@@ -6,9 +6,9 @@
**DNS you own. Everywhere you go.** — [numa.rs](https://numa.rs)
A portable DNS resolver in a single binary. Block ads on any network, name your local services (`frontend.numa`), and override any hostname with auto-revert — all from your laptop, no cloud account or Raspberry Pi required.
A portable DNS resolver in a single binary. Block ads on any network, name your local services (`frontend.numa`), override any hostname with auto-revert, and seal every outbound query with **ODoH (RFC 9230)** so no single party sees both who you are and what you asked — all from your laptop, no cloud account or Raspberry Pi required.
Built from scratch in Rust. Zero DNS libraries. RFC 1035 wire protocol parsed by hand. Caching, ad blocking, and local service domains out of the box. Optional recursive resolution from root nameservers with full DNSSEC chain-of-trust validation, plus a DNS-over-TLS listener for encrypted client connections (iOS Private DNS, systemd-resolved, etc.). One ~8MB binary, everything embedded.
Built from scratch in Rust. Zero DNS libraries. Caching, ad blocking, and local service domains out of the box. Optional recursive resolution from root nameservers with full DNSSEC chain-of-trust validation, plus a DNS-over-TLS listener for encrypted client connections (iOS Private DNS, systemd-resolved, etc.). Run `numa relay` and the same binary becomes a public ODoH endpoint too — the curated DNSCrypt list currently has one surviving relay, so every Numa deploy materially expands the ecosystem. One ~8MB binary, everything embedded.
![Numa dashboard](assets/hero-demo.gif)
@@ -27,6 +27,9 @@ yay -S numa-git
# Windows — download from GitHub Releases
# All platforms
cargo install numa
# Docker
docker run -d --name numa --network host ghcr.io/razvandimescu/numa
```
```bash
@@ -77,6 +80,14 @@ DNSSEC validates the full chain of trust: RRSIG signatures, DNSKEY verification,
ALPN `"dot"` is advertised and enforced in both modes; a handshake with mismatched ALPN is rejected as a cross-protocol confusion defense.
**Phone setup** — point your iPhone or Android at Numa in one step:
```bash
numa setup-phone
```
Prints a QR code. Scan it, install the profile, toggle certificate trust — your phone's DNS now routes through Numa over TLS. Requires `[mobile] enabled = true` in `numa.toml`.
## LAN Discovery
Run Numa on multiple machines. They find each other automatically via mDNS:
@@ -94,6 +105,26 @@ From Machine B: `curl http://api.numa` → proxied to Machine A's port 8000. Ena
**Hub mode**: run one instance with `bind_addr = "0.0.0.0:53"` and point other devices' DNS to it — they get ad blocking + `.numa` resolution without installing anything.
## Docker
```bash
# Recommended — host networking (Linux)
docker run -d --name numa --network host ghcr.io/razvandimescu/numa
# Port mapping (macOS/Windows Docker Desktop)
docker run -d --name numa -p 53:53/udp -p 53:53/tcp -p 5380:5380 ghcr.io/razvandimescu/numa
```
Dashboard at `http://localhost:5380`. The image binds the API and proxy to `0.0.0.0` by default. Override with a custom config:
```bash
docker run -d --name numa --network host \
-v /path/to/numa.toml:/root/.config/numa/numa.toml \
ghcr.io/razvandimescu/numa
```
Multi-arch: `linux/amd64` and `linux/arm64`.
## How It Compares
| | Pi-hole | AdGuard Home | Unbound | Numa |
@@ -105,17 +136,22 @@ From Machine B: `curl http://api.numa` → proxied to Machine A's port 8000. Ena
| DNSSEC validation | — | — | Yes | Yes (RSA, ECDSA, Ed25519) |
| Ad blocking | Yes | Yes | — | 385K+ domains |
| Web admin UI | Full | Full | — | Dashboard |
| Encrypted upstream (DoH) | Needs cloudflared | Yes | — | Native |
| Encrypted upstream (DoH/DoT) | Needs cloudflared | DoH only | DoT only | DoH + DoT (`tls://`) |
| Encrypted clients (DoT listener) | Needs stunnel sidecar | Yes | Yes | Native (RFC 7858) |
| DoH server endpoint | — | Yes | — | Yes (RFC 8484) |
| Request hedging | — | — | — | All protocols (UDP, DoH, DoT) |
| Serve-stale + prefetch | — | — | Prefetch at 90% TTL | RFC 8767, prefetch at 90% TTL |
| Conditional forwarding | — | Yes | Yes | Yes (per-suffix rules) |
| Portable (laptop) | No (appliance) | No (appliance) | Server | Single binary, macOS/Linux/Windows |
| Community maturity | 56K stars, 10 years | 33K stars | 20 years | New |
## Performance
691ns cached round-trip. ~2.0M qps throughput. Zero heap allocations in the hot path. Recursive queries average 237ms after SRTT warmup (12x improvement over round-robin). ECDSA P-256 DNSSEC verification: 174ns. [Benchmarks →](bench/)
0.1ms cached queries — matches Unbound and AdGuard Home. Wire-level cache stores raw bytes with in-place TTL patching. Request hedging eliminates p99 spikes: cold recursive p99 538ms vs Unbound 748ms (28%), σ 4× tighter. [Benchmarks →](benches/)
## Learn More
- [Blog: DNS-over-TLS from Scratch in Rust](https://numa.rs/blog/posts/dot-from-scratch.html)
- [Blog: Implementing DNSSEC from Scratch in Rust](https://numa.rs/blog/posts/dnssec-from-scratch.html)
- [Blog: I Built a DNS Resolver from Scratch](https://numa.rs/blog/posts/dns-from-scratch.html)
- [Configuration reference](numa.toml) — all options documented inline
@@ -126,10 +162,16 @@ From Machine B: `curl http://api.numa` → proxied to Machine A's port 8000. Ena
- [x] DNS forwarding, caching, ad blocking, developer overrides
- [x] `.numa` local domains — auto TLS, path routing, WebSocket proxy
- [x] LAN service discovery — mDNS, cross-machine DNS + proxy
- [x] DNS-over-HTTPS — encrypted upstream
- [x] DNS-over-TLS listener — encrypted client connections (RFC 7858, ALPN strict)
- [x] DNS-over-HTTPS — encrypted upstream + server endpoint (RFC 8484)
- [x] DNS-over-TLS — encrypted client listener (RFC 7858) + upstream forwarding (`tls://`)
- [x] Recursive resolution + DNSSEC — chain-of-trust, NSEC/NSEC3
- [x] SRTT-based nameserver selection
- [x] Multi-forwarder failover — multiple upstreams with SRTT ranking, fallback pool
- [x] Request hedging — parallel requests rescue packet loss and tail latency (all protocols)
- [x] Serve-stale + prefetch — RFC 8767, background refresh at <10% TTL and on stale serve
- [x] Conditional forwarding — per-suffix rules for split-horizon DNS (Tailscale, VPNs)
- [x] Cache warming — proactive resolution for configured domains
- [x] Mobile onboarding — `setup-phone` QR flow, mobile API, mobileconfig profiles
- [ ] pkarr integration — self-sovereign DNS via Mainline DHT
- [ ] Global `.numa` names — DHT-backed, no registrar

View File

@@ -0,0 +1,30 @@
[server]
bind_addr = "127.0.0.1:5454"
api_port = 5381
api_bind_addr = "127.0.0.1"
data_dir = "/tmp/numa-bench"
[upstream]
mode = "recursive"
timeout_ms = 10000
[cache]
min_ttl = 60
max_ttl = 3600
[blocking]
enabled = false
[proxy]
port = 8080
tls_port = 8443
[dot]
enabled = true
port = 8530
[mobile]
enabled = false
[lan]
enabled = false

31
benches/numa-bench.toml Normal file
View File

@@ -0,0 +1,31 @@
[server]
bind_addr = "127.0.0.1:5454"
api_port = 5381
api_bind_addr = "127.0.0.1"
data_dir = "/tmp/numa-bench"
[upstream]
mode = "forward"
address = ["https://9.9.9.9/dns-query"]
timeout_ms = 10000
[cache]
min_ttl = 60
max_ttl = 3600
[blocking]
enabled = false
[proxy]
port = 8080
tls_port = 8443
[dot]
enabled = true
port = 8530
[mobile]
enabled = false
[lan]
enabled = false

1100
benches/recursive_compare.rs Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,7 @@
---
title: I Built a DNS Resolver from Scratch in Rust
description: How DNS actually works at the wire level — label compression, TTL tricks, DoH, and what surprised me building a resolver with zero DNS libraries.
date: March 2026
date: 2026-03-20
---
I wanted to understand how DNS actually works. Not the "it translates domain names to IP addresses" explanation — the actual bytes on the wire. What does a DNS packet look like? How does label compression work? Why is everything crammed into 512 bytes?

View File

@@ -1,7 +1,7 @@
---
title: Implementing DNSSEC from Scratch in Rust
description: Recursive resolution from root hints, chain-of-trust validation, NSEC/NSEC3 denial proofs, and what I learned implementing DNSSEC with zero DNS libraries.
date: March 2026
date: 2026-03-28
---
In the [previous post](/blog/posts/dns-from-scratch.html) I covered how DNS works at the wire level — packet format, label compression, TTL caching, DoH. Numa was a forwarding resolver: it parsed packets, did useful things locally, and relayed the rest to Cloudflare or Quad9.
@@ -163,12 +163,12 @@ The fix has three parts:
**TCP fallback.** Every outbound query tries UDP first (800ms timeout). If UDP fails or the response is truncated, retry immediately over TCP. TCP uses a 2-byte length prefix before the DNS message — trivial to implement, and it handles DNSSEC responses that exceed the UDP payload limit.
**UDP auto-disable.** After 3 consecutive UDP failures, flip a global `AtomicBool` and skip UDP entirely — go TCP-first for all queries. This avoids burning 800ms per hop on a network where UDP will never work. The flag resets when the network changes (detected via LAN IP monitoring).
**UDP auto-disable.** After 3 consecutive UDP failures, flip a global `AtomicBool` and skip UDP entirely — go TCP-first for all queries. The flag resets when the network changes (detected via LAN IP monitoring).
<img src="../hostile-network.svg" alt="Latency profile on a hostile network: queries 1-3 each spend 800ms waiting for a UDP timeout before retrying over TCP, taking 1,100ms total per query. After 3 consecutive failures the UDP auto-disable flag flips, and queries 4+ go TCP-first and complete in 300ms each — 3.7× faster.">
**Query minimization (RFC 7816).** When querying root servers, send only the TLD — `com` instead of `secret-project.example.com`. Root servers handle trillions of queries and are operated by 12 organizations. Minimization reduces what they learn from yours.
The result: on a network that blocks UDP:53, Numa detects the block within the first 3 queries, switches to TCP, and resolves normally at 300-500ms per cold query. Cached queries remain 0ms. No manual config change needed — switch networks and it adapts.
I wouldn't have found this without dogfooding. The code worked perfectly on my home network. It took a real hostile network to expose the assumption that UDP always works.
## What I learned

176
blog/dot-from-scratch.md Normal file
View File

@@ -0,0 +1,176 @@
---
title: DNS-over-TLS from Scratch in Rust
description: Building RFC 7858 on top of rustls — length-prefix framing, ALPN cross-protocol defense, and two bugs that only the strict clients caught.
date: 2026-04-06
---
The [previous post](/blog/posts/dnssec-from-scratch.html) ended with "DoT — the last encrypted transport we don't support." This post is about building it.
Numa now runs a DoT listener on port 853. My iPhone uses it as its system resolver, so ad blocking, DNSSEC validation, and recursive resolution follow my phone through the day. No cloud, no account, no companion app — a self-signed cert, a `.mobileconfig` profile, and a QR code in the terminal.
RFC 7858 is ten pages. The hard parts weren't in the RFC. They were in cross-protocol confusion defenses, a crypto-provider init gotcha that only triggered in one specific config combination, and a certificate SAN bug iOS was happy to accept and `kdig` immediately rejected. This post is about those parts.
## Why DoT when you already have DoH?
Numa has shipped DoH since v0.1. Both protocols tunnel DNS over TLS; DoH wraps queries in HTTP/2, DoT is DNS-over-TCP with TLS in front. Same privacy guarantees, different wrapper.
The answer to "why both" is that **phones ask for DoT by name.** iOS system DNS configures it with two fields (IP + server name) instead of a URL template. Android 9+ "Private DNS" speaks DoT natively. Linux stubs default to DoT. I wanted my phone on Numa without installing anything on the phone itself, and DoT is the protocol iOS and Android already speak for that.
## The wire format is refreshingly small
RFC 7858 is one sentence of wire protocol: *DNS-over-TCP (RFC 1035 §4.2.2) with TLS in front, on port 853.* DNS-over-TCP has existed since 1987 — a 2-byte length prefix followed by the DNS message. DoT is that, wrapped in a TLS session. The entire framing code is seven lines:
```rust
async fn write_framed<S>(stream: &mut S, msg: &[u8]) -> io::Result<()>
where S: AsyncWriteExt + Unpin {
let mut out = Vec::with_capacity(2 + msg.len());
out.extend_from_slice(&(msg.len() as u16).to_be_bytes());
out.extend_from_slice(msg);
stream.write_all(&out).await?;
stream.flush().await
}
```
Reads are symmetric: `read_exact` two bytes, convert to `u16`, `read_exact` that many bytes. No HTTP headers, no chunked encoding, no framing layer.
## Persistent connections
A fresh TCP+TLS handshake is at least 3 RTTs — about 300ms on a 100ms connection, 60× the cost of a UDP query. RFC 7858 §3.4 says clients SHOULD reuse the TCP connection for multiple queries, and every real DoT client does: iOS, Android, systemd, stubby. A single connection often carries hundreds of queries.
<img src="../dot-handshake.svg" alt="Timing diagram comparing a DNS lookup over plain UDP (1 RTT), over DoT on a fresh connection (3 RTTs — TCP handshake, TLS 1.3 handshake, then the query), and over a reused DoT session (1 RTT, same as UDP).">
The amortization point is the whole game. If you only ever do one query per connection, DoT is roughly 3× slower than UDP and you should not use it. If you reuse the same TLS session for a browsing session's worth of queries, the handshake is paid once and every subsequent query is effectively free.
The server is a loop that reads a length-prefixed message, resolves it, writes the response framed the same way, waits for the next one. Three timeouts keep it honest:
- **Handshake timeout (10s)** — a slowloris that opens TCP but never sends a ClientHello can't pin a worker.
- **Idle timeout (30s)** — a connected client with nothing to say gets dropped.
- **Write timeout (10s)** — a stalled reader can't hold a response buffer indefinitely.
A semaphore caps concurrent connections at 512 so a burst of handshakes can't exhaust the tokio runtime.
## ALPN, the cross-protocol defense that matters
If DoT lives on port 853 and HTTPS on 443, what stops an HTTP/2 client from hitting 853 and getting confused replies? [Cross-protocol attacks](https://alpaca-attack.com/) exist and have had real CVEs. The defense is ALPN: during the TLS handshake the client advertises protocols, the server picks one it supports or fails. A DoT server advertises `"dot"`; a client offering only `"h2"` gets a `no_application_protocol` fatal alert before any frames are exchanged.
rustls enforces this by default when you set `alpn_protocols`:
```rust
let mut config = ServerConfig::builder()
.with_no_client_auth()
.with_single_cert(certs, key)?;
config.alpn_protocols = vec![b"dot".to_vec()];
```
"The library enforces it by default" has a latent risk: a future rustls upgrade could change the default, and the defense would quietly evaporate. I wrote a test that pins the behavior so any regression in a dependency update fails loudly:
```rust
#[tokio::test]
async fn dot_rejects_non_dot_alpn() {
let (addr, cert_der) = spawn_dot_server().await;
let client_config = dot_client(&cert_der, vec![b"h2".to_vec()]);
let connector = tokio_rustls::TlsConnector::from(client_config);
let tcp = tokio::net::TcpStream::connect(addr).await.unwrap();
let result = connector
.connect(ServerName::try_from("numa.numa").unwrap(), tcp)
.await;
assert!(result.is_err(),
"DoT server must reject ALPN that doesn't include \"dot\"");
}
```
When you're leaning on a library's default for a security-critical invariant, the test is the contract.
## Two bugs that hid for days
Both were fixed before v0.10 shipped. Both stayed hidden because my initial tests used *permissive* clients.
### The rustls crypto provider panic
rustls 0.23 requires a `CryptoProvider` installed before you can build a `ServerConfig`. Numa's HTTPS proxy calls `install_default` as a side effect when it builds its own config, so DoT "just worked" for users who enabled both — the proxy had already initialized the provider before DoT's first handshake.
Then I added support for user-provided DoT certificates. Someone running DoT with their own Let's Encrypt cert, with the HTTPS proxy disabled, would hit:
```
thread 'dot' panicked at rustls-0.23.25/src/crypto/mod.rs:185:14:
no process-level CryptoProvider available -- call
CryptoProvider::install_default() before this point
```
The panic happened on the first client connection, not at startup. While writing the integration suite for "DoT with BYO cert, proxy disabled" — the one combination nobody had ever actually exercised — the first run panicked. Fix is two lines: call `install_default` inside `load_tls_config` so DoT can stand alone. If a side effect initializes something and you have a path that skips that side effect, you have a bug waiting for a specific deployment.
### The SAN bug iOS was happy to accept
Numa's self-signed DoT cert is generated on first run from a local CA alongside the data directory. It needs to match whatever `ServerName` the client sends as SNI. For the HTTPS proxy, that's the wildcard domain pattern `*.numa` (matching `frontend.numa`, `api.numa`, etc.). I initially reused the same SAN list for DoT: a wildcard `*.numa` and nothing else.
On an iPhone this worked perfectly. Full browsing session, persistent connections in the log, ad blocking active. I was about to merge when I ran one last smoke test with `kdig` (GnuTLS-backed, from [Knot DNS](https://www.knot-dns.cz/)):
```
$ kdig @192.168.1.16 -p 853 +tls \
+tls-ca=/usr/local/var/numa/ca.pem \
+tls-hostname=numa.numa example.com A
;; TLS, handshake failed (Error in the certificate.)
```
Huh.
[RFC 6125 §6.4.3](https://datatracker.ietf.org/doc/html/rfc6125#section-6.4.3): a wildcard in a certificate's DNS-ID matches exactly one label. `*.numa` matches `frontend.numa`, but not `numa.numa`, because the wildcard wants at least one label to substitute and strict clients reject wildcards in the leftmost label under single-label TLDs as ambiguous.
iOS's TLS stack is lenient and accepts it. GnuTLS, NSS (Firefox), and most non-Apple validators don't. The fix is five lines — add an explicit `numa.numa` SAN alongside the wildcard. But the lesson is the one that stuck: I wrote a commit message saying "fix an iOS bug" and had to rewrite it, because iOS was fine. The real bug was that every GnuTLS/NSS-based client on the planet would have rejected the cert, and I only found it by running one more test with a stricter tool.
> Test with the strict client. The permissive client hides your bugs.
## Getting your phone onto it
A DoT server is useless without a way to point a phone at it. iOS won't let you type an IP and a server name into Settings directly — you install a `.mobileconfig` profile that bundles the CA as a trust anchor and the DNS settings in a single payload.
Numa ships a subcommand that builds one on the fly and serves it over a QR code in the terminal:
```
$ numa setup-phone
Numa Phone Setup
Profile URL: http://192.168.1.10:8765/mobileconfig
██████████████████████████████
██ ██
██ [QR code rendered in ██
██ your terminal] ██
██ ██
██████████████████████████████
On your iPhone:
1. Open Camera, point at the QR code, tap the yellow banner
2. Allow the download when Safari asks
3. Open Settings — tap "Profile Downloaded" near the top
(or: Settings → General → VPN & Device Management → Numa DNS)
4. Tap Install (top right), enter passcode, Install again
5. Settings → General → About → Certificate Trust Settings
Toggle ON "Numa Local CA" — required for DoT to work
```
The same QR is available in the dashboard — click "Phone Setup" in the header and the popover renders an SVG QR code pointing at the mobileconfig URL. On mobile viewports it shows a direct download link instead.
<img src="../phone-setup-dashboard.png" alt="Numa dashboard with Phone Setup popover showing QR code and install instructions">
Step 4 is non-negotiable. Even though the CA is bundled in the same profile that installs the DNS settings, iOS still requires the user to explicitly toggle trust in Certificate Trust Settings. It's a deliberate iOS policy to prevent profile-based trust injection — annoying, and correct.
I've been dogfooding this since v0.10 shipped in early April. The phone resolves through Numa over DoT whenever I'm home; persistent connections are visible in the log as a single source port living through dozens of queries. The one real caveat: if the laptop's LAN IP changes, the profile breaks. [RFC 9462 DDR](https://datatracker.ietf.org/doc/html/rfc9462) fixes that — Numa can respond to `_dns.resolver.arpa IN SVCB` with its current IP and iOS picks it up on each network join. Next piece of work.
## What I learned
**RFC-level small, API-level hard.** RFC 7858 is ten pages. The framing is trivial. But the subtle stuff — ALPN, timeouts, connection caps, handshake vs idle vs write deadlines, backoff on accept errors — isn't in the RFC. Miss any of it and you leak a DoS vector or a protocol confusion hole.
**Your test matrix is your security matrix.** Both bugs in this post were hidden by lenient clients. In both cases the strict client — kdig, or a specific config combination — surfaced the bug instantly. Pick test tools for strictness, not convenience. The moment you find yourself thinking "but iOS accepts it," stop and run kdig.
**Don't initialize global state via side effects.** "Module A installs a global, module B silently depends on it, disabling A breaks B" is a bug pattern that keeps coming back. Fix: have module B initialize its dependency explicitly, even if it means calling an idempotent `install_default` twice. The dependency graph should be local and obvious.
## What's next
- ~~**DoH server**~~ — shipped in v0.12.0. `POST /dns-query` accepts [RFC 8484](https://datatracker.ietf.org/doc/html/rfc8484) wire-format queries, so Firefox/Chrome can point their built-in DoH at Numa.
- **DoQ server (RFC 9250)** — DNS over QUIC. Android 14+ supports it natively.
- **DDR (RFC 9462)** — auto-discovery via `_dns.resolver.arpa IN SVCB`, so phones pick up a moved Numa instance without the installed profile going stale.
The code is at [github.com/razvandimescu/numa](https://github.com/razvandimescu/numa) — the DoT listener is in [`src/dot.rs`](https://github.com/razvandimescu/numa/blob/main/src/dot.rs) and the phone onboarding flow is in [`src/setup_phone.rs`](https://github.com/razvandimescu/numa/blob/main/src/setup_phone.rs) and [`src/mobileconfig.rs`](https://github.com/razvandimescu/numa/blob/main/src/mobileconfig.rs). MIT license.

View File

@@ -0,0 +1,171 @@
---
title: Fixing DNS tail latency with a 5-line config and a 50-line function
description: Periodic 40-140ms DoH spikes from hyper's dispatch channel. The fix was reqwest window tuning and request hedging — Dean & Barroso's "The Tail at Scale," applied to a DNS forwarder. Same ideas took cold recursive p99 from 2.3 seconds to 538ms.
date: 2026-04-12
---
If you're using reqwest for small HTTP/2 payloads, you probably have a tail latency problem you don't know about. Hyper's default flow control windows are 10,000× oversized for anything under 1 KB, and its dispatch channel adds periodic 40-140ms stalls that don't show up in median benchmarks.
I hit this building Numa's DoH forwarding path. Median was 10ms, mean was 23ms — the tail was dragging everything.
<div class="hero-metrics">
<div class="metric-card">
<div class="metric-vs">DoH forwarding p99</div>
<div class="metric-value">113 → 71ms</div>
<div class="metric-label">window tuning + request hedging</div>
</div>
<div class="metric-card">
<div class="metric-vs">Cold recursive p99</div>
<div class="metric-value">2.3s → 538ms</div>
<div class="metric-label">NS caching, serve-stale, parallel queries</div>
</div>
<div class="metric-card">
<div class="metric-vs">Forwarding σ</div>
<div class="metric-value">31 → 13ms</div>
<div class="metric-label">random spikes become parallel races</div>
</div>
</div>
The fix was a 5-line reqwest config and a 50-line hedging function. This post is also an advertisement for Dean & Barroso's 2013 paper ["The Tail at Scale"](https://research.google/pubs/pub40801/) — a decade-old idea that still demolishes dispatch spikes. The same ideas later took my cold recursive p99 from 2.3 seconds to 538ms.
---
## The cause: hyper's dispatch channel
Reqwest sits on top of hyper, which interposes an mpsc dispatch channel and a separate `ClientTask` between `.send()` and the h2 stream. I instrumented the forwarding path and confirmed: 100% of the spike time lives in the `send()` phase, and a parallel heartbeat task showed zero runtime lag during spikes. The tokio runtime was fine — the stall was internal to hyper's request scheduling.
Hickory-resolver doesn't have this issue. It holds `h2::SendRequest<Bytes>` directly and calls `ready().await; send_request()` in the caller's task — no channel, no scheduling dependency. I used it as a reference point throughout.
## Fix #1 — HTTP/2 window sizes
Reqwest inherits hyper's HTTP/2 defaults: 2 MB stream window, 5 MB connection window. For DNS responses (~200 bytes), that's ~10,000× oversized — unnecessary WINDOW_UPDATE frames, bloated bookkeeping on every poll, and different server-side scheduling behavior.
Setting both windows to the h2 spec default (64 KB) dropped my median from 13.3ms to 10.1ms:
```rust
reqwest::Client::builder()
.use_rustls_tls()
.http2_initial_stream_window_size(65_535)
.http2_initial_connection_window_size(65_535)
.http2_keep_alive_interval(Duration::from_secs(15))
.http2_keep_alive_while_idle(true)
.http2_keep_alive_timeout(Duration::from_secs(10))
.pool_idle_timeout(Duration::from_secs(300))
.pool_max_idle_per_host(1)
.build()
```
**Any Rust code using reqwest for tiny-payload HTTP/2 workloads — DoH, API polling, metric scraping — is probably hitting this.**
## Fix #2 — Request hedging
["The Tail at Scale"](https://research.google/pubs/pub40801/) (Dean & Barroso, 2013): fire a request, and if it doesn't return within your P50 latency, fire the same request in parallel. First response wins.
The intuition: if 5% of requests spike due to independent random events, two parallel requests means only 0.25% of pairs spike on *both*. The tail collapses.
**The surprise: hedging against the same upstream works.** HTTP/2 multiplexes streams — two `send_request()` calls on one connection become independent h2 streams. If one stalls in the dispatch channel, the other keeps making progress.
```rust
pub async fn forward_with_hedging_raw(
wire: &[u8],
primary: &Upstream,
secondary: &Upstream,
hedge_delay: Duration,
timeout_duration: Duration,
) -> Result<Vec<u8>> {
let primary_fut = forward_query_raw(wire, primary, timeout_duration);
tokio::pin!(primary_fut);
let delay = sleep(hedge_delay);
tokio::pin!(delay);
// Phase 1: wait for primary to return OR the hedge delay.
tokio::select! {
result = &mut primary_fut => return result,
_ = &mut delay => {}
}
// Phase 2: hedge delay expired — fire secondary, keep primary alive.
let secondary_fut = forward_query_raw(wire, secondary, timeout_duration);
tokio::pin!(secondary_fut);
// First successful response wins.
tokio::select! {
r = primary_fut => r,
r = secondary_fut => r,
}
}
```
The [production version](https://github.com/razvandimescu/numa/blob/main/src/forward.rs#L267) adds error handling — if one leg fails, it waits for the other. In production, Numa passes the same `&Upstream` twice when only one is configured. I extended hedging to all protocols — UDP (rescues packet loss on WiFi), DoT (rescues TLS handshake stalls). Configurable via `hedge_ms`; set to 0 to disable.
**Caveat: hedging hurts on degraded networks.** When latency is consistently high (no random spikes, just slow), the hedge adds overhead with nothing to rescue. Hedging is a variance reducer, not a latency reducer — it only helps when spikes are *random*.
---
## Forwarding results
5 iterations × 101 domains × 10 rounds, 5,050 samples per method. Hickory-resolver included as a reference (it uses h2 directly, no dispatch channel):
| | Single | **Hedged** | Hickory (ref) |
|---|---|---|---|
| mean | 17.4ms | **14.3ms** | 16.8ms |
| median | 10.4ms | **10.2ms** | 13.3ms |
| p95 | 52.5ms | **28.6ms** | 37.7ms |
| p99 | 113.4ms | **71.3ms** | 98.1ms |
| σ | 30.6ms | **13.2ms** | 19.1ms |
The internal improvement: hedging cut p95 by 45%, p99 by 37%, σ by 57%. The exact margin vs hickory varies with network conditions; the σ reduction is consistent across runs.
## Recursive resolution: from 2.3 seconds to 538ms
Forwarding is one job. Recursive resolution — walking from root hints through TLD nameservers to the authoritative server — is a different one. I started 15× behind Unbound on cold recursive p99 and traced it to four root causes.
**1. Missing NS delegation caching.** I cached glue records (ns1's IP) but not the delegation itself. Every `.com` query walked from root. Fix: cache NS records from referral authority sections. (10 lines)
**2. Expired cache entries caused full cold resolutions.** Fix: serve-stale ([RFC 8767](https://www.rfc-editor.org/rfc/rfc8767)) — return expired entries with TTL=1 while revalidating in the background. (20 lines)
**3. Wasting 1,900ms per unreachable server.** 800ms UDP timeout + unconditional 1,500ms TCP fallback. Fix: 400ms UDP, TCP only for truncation. (5 lines)
**4. Sequential NS queries on cold starts.** Fix: fire to the top 2 nameservers simultaneously. First response wins, SRTT recorded for both. Same hedging principle. (50 lines)
<div class="before-after">
<div class="ba-item">
<div class="ba-label">p99 before</div>
<div class="ba-value ba-before">2,367ms</div>
</div>
<div class="ba-arrow">&#8594;</div>
<div class="ba-item">
<div class="ba-label">p99 after</div>
<div class="ba-value ba-after">538ms</div>
</div>
<div class="ba-item ba-ref">
<div class="ba-label">Unbound (ref)</div>
<div class="ba-value">748ms</div>
</div>
</div>
Genuine cold benchmarks — unique subdomains, 1 query per domain, 5 iterations, 505 samples per server:
| | Baseline | Final | Unbound (ref) |
|---|---|---|---|
| p99 | 2,367ms | **538ms** | 748ms |
| σ | 254ms | **114ms** | 457ms |
| median | — | 77.6ms | 74.7ms |
Unbound wins median by ~4%. Where hedging shines is the tail — domains with slow or unreachable nameservers, where parallel queries turn worst-case sequential timeouts into races. Cache hits are tied at 0.1ms across Numa, Unbound, and AdGuard Home.
What I'm exploring next: persistent SRTT data across restarts (currently cold-starts lose all server timing), aggressive NSEC caching to shortcut negative lookups, and adaptive hedge delays that tune themselves to observed network conditions instead of a fixed 10ms.
---
## Takeaways
The real hero of this post is Dean & Barroso. Hedging works because **spikes are random, and two random draws rarely both lose**. It's effective for any HTTP/2 client, any language, any forwarder topology. Nobody we know of ships it by default.
If you're building a Rust service that makes many small HTTP/2 requests to the same backend: check your flow control window sizes first, then implement hedging. Don't rewrite the client.
Benchmarks are in [`benches/recursive_compare.rs`](https://github.com/razvandimescu/numa/blob/main/benches/recursive_compare.rs) — run them yourself. If you're using reqwest for tiny-payload workloads and try the window size fix, I'd love to hear if you see the same improvement.
---
Numa is a DNS resolver that runs on your laptop or phone. DoH, DoT, .numa local domains, ad blocking, developer overrides, a REST API, and all the optimization work in this post. [github.com/razvandimescu/numa](https://github.com/razvandimescu/numa).

48
build.rs Normal file
View File

@@ -0,0 +1,48 @@
fn main() {
// --long forces "TAG-N-gSHA[-dirty]" format even on exact tag matches,
// making parsing unambiguous for pre-release tags like v0.14.0-rc1.
let git_version = std::process::Command::new("git")
.args(["describe", "--tags", "--always", "--dirty", "--long"])
.output()
.ok()
.filter(|o| o.status.success())
.and_then(|o| String::from_utf8(o.stdout).ok())
.and_then(|raw| parse_git_describe(raw.trim()));
if let Some(v) = git_version {
println!("cargo:rustc-env=NUMA_BUILD_VERSION={}", v);
}
println!("cargo:rerun-if-changed=.git/HEAD");
}
/// Parse `git describe --long` output into a SemVer-compatible string.
/// "v0.13.1-0-ga87f907" → "0.13.1"
/// "v0.13.1-9-ga87f907" → "0.13.1+a87f907"
/// "v0.14.0-rc1-0-ga87f907" → "0.14.0-rc1"
/// "v0.14.0-rc1-3-ga87f907-dirty" → "0.14.0-rc1+a87f907-dirty"
/// "a87f907" → "0.0.0+a87f907"
fn parse_git_describe(s: &str) -> Option<String> {
let s = s.strip_prefix('v').unwrap_or(s);
let dirty = s.ends_with("-dirty");
let s = s.strip_suffix("-dirty").unwrap_or(s);
// --long format: TAG-N-gSHA. Split from the right so tags with hyphens work.
let gpos = s.rfind("-g")?;
let sha = &s[gpos + 2..];
let rest = &s[..gpos];
let npos = rest.rfind('-')?;
let n: u32 = rest[npos + 1..].parse().ok()?;
let tag = &rest[..npos];
if tag.is_empty() {
return Some(format!("0.0.0+{}", sha));
}
Some(match (n, dirty) {
(0, false) => tag.to_string(),
(0, true) => format!("{}+{}-dirty", tag, sha),
(_, false) => format!("{}+{}", tag, sha),
(_, true) => format!("{}+{}-dirty", tag, sha),
})
}

View File

@@ -8,6 +8,39 @@ Type=simple
ExecStart={{exe_path}}
Restart=always
RestartSec=2
# Transient system user per start; no PKGBUILD/sysusers setup required.
# systemd remaps the StateDirectory ownership to the dynamic UID on each
# launch, including legacy root-owned trees from pre-drop installs.
DynamicUser=yes
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
StateDirectory=numa
StateDirectoryMode=0750
ConfigurationDirectory=numa
ConfigurationDirectoryMode=0755
# Sandboxing — conservative set known to work with Rust network daemons.
# Aggressive hardening (MemoryDenyWriteExecute, SystemCallFilter, seccomp
# allow-lists) can be layered on once tested in isolation.
NoNewPrivileges=true
ProtectSystem=strict
# DynamicUser= sets ProtectHome=read-only by default — leaves /home
# readable so systemd can exec binaries installed under it (cargo install,
# source builds), while blocking writes to user $HOMEs. Don't set =yes:
# that hides /home entirely and fails with status=203/EXEC.
PrivateTmp=true
PrivateDevices=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
RestrictSUIDSGID=true
# AF_NETLINK for interface enumeration on network changes
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK
StandardOutput=journal
StandardError=journal
SyslogIdentifier=numa

View File

@@ -8,15 +8,32 @@ api_port = 5380
# %PROGRAMDATA%\numa on windows. Override for
# containerized deploys or tests that can't
# write to the system path.
# filter_aaaa = true # on IPv4-only networks, answer AAAA queries with
# NODATA (NOERROR + empty answer) so Happy Eyeballs
# clients don't wait on a v6 attempt that can't
# succeed. Also strips `ipv6hint` from HTTPS/SVCB
# records (RFC 9460) so modern browsers (Chrome
# ≥103, Firefox, Safari) don't bypass the AAAA
# filter via SVCB hints. Local zones, overrides,
# and the .numa proxy are NOT filtered — you can
# still configure v6 records for local services.
# Default: false.
# [upstream]
# mode = "forward" # "forward" (default) — relay to upstream
# # "recursive" — resolve from root hints (no address needed)
# address = "9.9.9.9" # single upstream (plain UDP)
# address = ["192.168.1.1", "9.9.9.9:5353"] # multiple upstreams — SRTT picks fastest
# address = "https://dns.quad9.net/dns-query" # DNS-over-HTTPS (encrypted)
# address = "https://cloudflare-dns.com/dns-query" # Cloudflare DoH
# address = "9.9.9.9" # plain UDP
# port = 53 # only for forward mode, plain UDP
# address = "tls://9.9.9.9#dns.quad9.net" # DNS-over-TLS (encrypted, port 853)
# fallback = ["8.8.8.8", "1.1.1.1"] # tried only when all primaries fail
# port = 53 # default port for addresses without :port
# timeout_ms = 3000
# hedge_ms = 10 # request hedging delay (ms). After this delay
# # without a response, fires a parallel request
# # to the same upstream. Rescues packet loss (UDP),
# # dispatch spikes (DoH), TLS stalls (DoT).
# # Set to 0 to disable. Default: 10
# root_hints = [ # only used in recursive mode
# "198.41.0.4", # a.root-servers.net (Verisign)
# "199.9.14.201", # b.root-servers.net (USC-ISI)
@@ -44,6 +61,29 @@ api_port = 5380
# "co", "br", "au", "ca", "jp", # other major ccTLDs
# ]
# [[forwarding]] # per-suffix conditional forwarding rules
# suffix = "168.192.in-addr.arpa" # single suffix → one upstream
# upstream = "100.90.1.63:5361"
#
# [[forwarding]]
# suffix = ["home.local", "home.arpa"] # multiple suffixes → same upstream
# upstream = "10.0.0.1" # port 53 default
#
# [[forwarding]] # DoT upstream: tls://IP[:port]#hostname
# suffix = ["google.com", "goog"] # hostname is the TLS SNI / cert name
# upstream = "tls://9.9.9.9#dns.quad9.net" # port 853 default
#
# [[forwarding]] # DoH upstream: full https:// URL
# suffix = "example.corp"
# upstream = "https://dns.quad9.net/dns-query"
#
# [[forwarding]] # array of upstreams → SRTT-aware failover
# suffix = ["google.com", "goog"] # fastest-healthy first, dead one skipped
# upstream = [
# "tls://9.9.9.9#dns.quad9.net",
# "tls://149.112.112.112#dns.quad9.net",
# ]
# [blocking]
# enabled = true # set to false to disable ad blocking
# refresh_hours = 24
@@ -51,9 +91,10 @@ api_port = 5380
# allowlist = ["example.com"] # domains to never block
[cache]
max_entries = 10000
max_entries = 100000
min_ttl = 60
max_ttl = 86400
# warm = ["google.com", "github.com"] # resolve at startup, refresh before TTL expiry
[proxy]
enabled = true
@@ -91,7 +132,7 @@ tld = "numa"
# DNS-over-TLS listener (RFC 7858) — encrypted DNS on port 853
# [dot]
# enabled = false # opt-in: accept DoT queries
# enabled = true # on by default; set false to disable
# port = 853 # standard DoT port
# bind_addr = "0.0.0.0" # IPv4 or IPv6; unspecified binds all interfaces
# cert_path = "/etc/numa/dot.crt" # PEM cert; omit to use self-signed (proxy CA if available)
@@ -102,3 +143,22 @@ tld = "numa"
# enabled = true # discover other Numa instances via mDNS (_numa._tcp.local)
# broadcast_interval_secs = 30
# peer_timeout_secs = 90
# Mobile API — persistent HTTP listener serving read-only routes
# (/health, /ca.pem, /mobileconfig, /ca.mobileconfig) on a LAN-reachable
# port. Consumed by the iOS/Android companion apps for discovery and
# profile fetching, and by `numa setup-phone` for QR-based onboarding.
#
# Opt-in because the listener binds to the LAN by default. None of the
# exposed routes are cryptographically sensitive (no private keys, no
# state mutations, all idempotent GETs), but enabling it does add a new
# listener to any device on the LAN that scans port 8765.
#
# Safe for home LANs. Think twice before enabling on untrusted LANs
# (office Wi-Fi, coffee shops, etc.) — an attacker on the same network
# could run a competing Numa instance that shadows yours via mDNS and
# trick companion apps into installing their profile instead of yours.
[mobile]
enabled = true # opt-in to the mobile API listener
# port = 8765 # default; matches Discovery.swift defaultAPIPort
# bind_addr = "0.0.0.0" # default; set to "127.0.0.1" for localhost-only

15
packaging/relay/Caddyfile Normal file
View File

@@ -0,0 +1,15 @@
odoh-relay.example.com {
handle /relay {
reverse_proxy numa-relay:8443
}
handle /health {
reverse_proxy numa-relay:8443
}
respond 404
# Per-request access logs defeat the point of an oblivious relay.
# Aggregate counters are exposed at /health on the relay itself.
log {
output discard
}
}

48
packaging/relay/README.md Normal file
View File

@@ -0,0 +1,48 @@
# Numa ODoH Relay — Docker deploy
Two-container deploy: Caddy terminates TLS (auto-provisioning a Let's Encrypt
cert via ACME) and reverse-proxies to a Numa relay running on an internal
Docker network. The relay never reads sealed payloads; Caddy never logs them.
## Prerequisites
- A host with public 80/443 reachable from the internet.
- A DNS record (`A` or `AAAA`) pointing your chosen hostname at the host.
- Docker + Docker Compose v2.
## Configure
Edit `Caddyfile` and replace `odoh-relay.example.com` with your hostname.
That hostname is what ACME validates against and what ODoH clients will
configure as their relay URL: `https://<hostname>/relay`.
## Deploy
```sh
docker compose up -d
docker compose logs -f caddy # watch ACME provisioning
```
First boot takes a few seconds while Caddy obtains the cert. Subsequent
restarts reuse the cached cert from the `caddy_data` volume.
## Verify
```sh
curl https://<hostname>/health
# ok
# total 0
# forwarded_ok 0
# forwarded_err 0
# rejected_bad_request 0
```
Then point any ODoH client at `https://<hostname>/relay` and watch the
counters tick.
## Listing on the public ecosystem
DNSCrypt's [v3/odoh-relays.md](https://github.com/DNSCrypt/dnscrypt-resolvers/blob/master/v3/odoh-relays.md)
is the canonical list. The pruned 2025-09-16 commit shows one public ODoH
relay survived the cull — running this compose file doubles global supply.
Open a PR there once your relay has been up for ~24 hours.

View File

@@ -0,0 +1,26 @@
services:
numa-relay:
image: ghcr.io/razvandimescu/numa:latest
command: ["relay", "8443", "0.0.0.0"]
restart: unless-stopped
networks: [internal]
caddy:
image: caddy:2
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
restart: unless-stopped
depends_on: [numa-relay]
networks: [internal]
networks:
internal:
volumes:
caddy_data:
caddy_config:

239
scripts/generate-blog-index.sh Executable file
View File

@@ -0,0 +1,239 @@
#!/usr/bin/env bash
set -euo pipefail
# Generate site/blog/index.html from blog/*.md frontmatter.
# Reads title, description, date from YAML frontmatter in each post.
# Sorts newest first (by date string — "April 2026" > "March 2026").
OUT="site/blog/index.html"
# Extract frontmatter fields from a markdown file
extract() {
local file="$1" field="$2"
sed -n '/^---$/,/^---$/p' "$file" | grep "^${field}:" | sed "s/^${field}: *//"
}
# Collect posts: "date|name|title|description" per line
posts=""
sources="blog/*.md"
if [ "${BLOG_INCLUDE_DRAFTS:-}" = "1" ] && ls drafts/*.md >/dev/null 2>&1; then
sources="blog/*.md drafts/*.md"
fi
for f in $sources; do
name=$(basename "$f" .md)
title=$(extract "$f" title)
desc=$(extract "$f" description)
date=$(extract "$f" date)
posts+="${date}|${name}|${title}|${desc}"$'\n'
done
# Sort by ISO date (YYYY-MM-DD), newest first
posts=$(echo "$posts" | grep -v '^$' | sort -t'|' -k1 -r)
# Format ISO date (YYYY-MM-DD) to "Month YYYY"
format_date() {
local months=(January February March April May June July August September October November December)
local y="${1%%-*}"
local m="${1#*-}"; m="${m%%-*}"; m=$((10#$m))
echo "${months[$((m-1))]} $y"
}
# Generate post list items
items=""
while IFS='|' read -r date name title desc; do
display_date=$(format_date "$date")
items+=" <li>
<a href=\"/blog/posts/${name}.html\">
<div class=\"post-title\">${title}</div>
<div class=\"post-desc\">${desc}</div>
<div class=\"post-date\">${display_date}</div>
</a>
</li>
"
done <<< "$posts"
# Write the full index.html — style matches the existing hand-maintained version
cat > "$OUT" << HTMLEOF
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Blog — Numa</title>
<meta name="description" content="Technical writing about DNS, Rust, and building infrastructure from scratch.">
<link rel="stylesheet" href="/fonts/fonts.css">
<style>
*, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }
:root {
--bg-deep: #f5f0e8;
--bg-surface: #ece5da;
--bg-card: #faf7f2;
--amber: #c0623a;
--amber-dim: #9e4e2d;
--teal: #6b7c4e;
--text-primary: #2c2418;
--text-secondary: #6b5e4f;
--text-dim: #a39888;
--border: rgba(0, 0, 0, 0.08);
--font-display: 'Instrument Serif', Georgia, serif;
--font-body: 'DM Sans', system-ui, sans-serif;
--font-mono: 'JetBrains Mono', monospace;
}
body {
background: var(--bg-deep);
color: var(--text-primary);
font-family: var(--font-body);
font-weight: 400;
line-height: 1.7;
-webkit-font-smoothing: antialiased;
}
body::before {
content: '';
position: fixed;
inset: 0;
background-image: url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.9' numOctaves='4' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23n)' opacity='0.025'/%3E%3C/svg%3E");
pointer-events: none;
z-index: 9999;
}
.blog-nav {
padding: 1.5rem 2rem;
display: flex;
align-items: center;
gap: 1.5rem;
}
.blog-nav a {
font-family: var(--font-mono);
font-size: 0.75rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
text-decoration: none;
transition: color 0.2s;
}
.blog-nav a:hover { color: var(--amber); }
.blog-nav .wordmark {
font-family: var(--font-display);
font-size: 1.4rem;
font-weight: 400;
color: var(--text-primary);
text-decoration: none;
text-transform: none;
letter-spacing: -0.02em;
}
.blog-nav .wordmark:hover { color: var(--amber); }
.blog-nav .sep {
color: var(--text-dim);
font-family: var(--font-mono);
font-size: 0.75rem;
}
.blog-index {
max-width: 720px;
margin: 0 auto;
padding: 3rem 2rem 6rem;
}
.blog-index h1 {
font-family: var(--font-display);
font-weight: 400;
font-size: 2.5rem;
margin-bottom: 3rem;
}
.post-list {
list-style: none;
}
.post-list li {
padding: 1.5rem 0;
border-bottom: 1px solid var(--border);
}
.post-list li:first-child {
border-top: 1px solid var(--border);
}
.post-list a {
text-decoration: none;
display: block;
}
.post-list .post-title {
font-family: var(--font-display);
font-size: 1.4rem;
font-weight: 600;
color: var(--text-primary);
line-height: 1.3;
margin-bottom: 0.4rem;
transition: color 0.2s;
}
.post-list a:hover .post-title {
color: var(--amber);
}
.post-list .post-desc {
font-size: 0.95rem;
color: var(--text-secondary);
line-height: 1.5;
margin-bottom: 0.4rem;
}
.post-list .post-date {
font-family: var(--font-mono);
font-size: 0.72rem;
color: var(--text-dim);
letter-spacing: 0.04em;
}
.blog-footer {
text-align: center;
padding: 3rem 2rem;
border-top: 1px solid var(--border);
max-width: 720px;
margin: 0 auto;
}
.blog-footer a {
font-family: var(--font-mono);
font-size: 0.75rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
text-decoration: none;
margin: 0 1rem;
}
.blog-footer a:hover { color: var(--amber); }
</style>
</head>
<body>
<nav class="blog-nav">
<a href="/" class="wordmark">Numa</a>
<span class="sep">/</span>
<a href="/blog/">Blog</a>
</nav>
<main class="blog-index">
<h1>Blog</h1>
<ul class="post-list">
${items} </ul>
</main>
<footer class="blog-footer">
<a href="https://github.com/razvandimescu/numa">GitHub</a>
<a href="/">Home</a>
</footer>
</body>
</html>
HTMLEOF
echo " blog/index.html generated ($(echo "$posts" | wc -l | tr -d ' ') posts)"

View File

@@ -7,18 +7,19 @@
# The script:
# 1. Opens the dashboard in Chrome --app mode (clean, no address bar)
# 2. Generates DNS traffic (forward, cache hit, blocked)
# 3. Types "peekm" / "6419" into the Local Services form on camera
# 4. Shows LAN accessibility badge ("local only" / "LAN")
# 5. Checks a blocked domain
# 6. Opens peekm.numa to show the proxy working
# 7. Records via ffmpeg and converts to optimized GIF
# 3. Opens Phone Setup QR popover
# 4. Types "peekm" / "6419" into the Local Services form on camera
# 5. Shows LAN accessibility badge ("local only" / "LAN")
# 6. Checks a blocked domain
# 7. Opens peekm.numa to show the proxy working
# 8. Records via ffmpeg and converts to optimized GIF
set -euo pipefail
# --------------- Configuration ---------------
OUTPUT="${1:-assets/hero-demo.gif}"
PORT=5380
RECORD_SECONDS=20
RECORD_SECONDS=24
VIEWPORT_W=1800
VIEWPORT_H=1100
FPS=12
@@ -230,8 +231,16 @@ dig @127.0.0.1 github.com +short > /dev/null 2>&1
dig @127.0.0.1 ad.doubleclick.net +short > /dev/null 2>&1
sleep 3
# --------------- Scene 2: Add peekm service via UI (3-7s) ---------------
log "Scene 2: Adding peekm.numa service..."
# --------------- Scene 2: Phone Setup popover (3-7s) ---------------
log "Scene 2: Phone Setup QR popover..."
run_js "document.querySelector('#phoneSetup button').click();"
sleep 3
# Dismiss popover
run_js "document.getElementById('phoneSetupPopover').style.display = 'none';"
sleep 1
# --------------- Scene 3: Add peekm service via UI (7-11s) ---------------
log "Scene 3: Adding peekm.numa service..."
# Services panel is now first — scroll to it
run_js "
@@ -249,18 +258,18 @@ sleep 0.3
run_js "document.querySelector('#serviceForm .btn-add').click();"
sleep 2
# --------------- Scene 3: Open peekm.numa (7-11s) ---------------
log "Scene 3: Opening peekm.numa in browser..."
# --------------- Scene 4: Open peekm.numa (11-15s) ---------------
log "Scene 4: Opening peekm.numa in browser..."
open "http://peekm.numa/view/peekm/README.md" 2>/dev/null || true
sleep 4
# --------------- Scene 4: Back to dashboard (11-14s) ---------------
log "Scene 4: Back to dashboard — LAN badges + LOCAL queries visible..."
# --------------- Scene 5: Back to dashboard (15-18s) ---------------
log "Scene 5: Back to dashboard — LAN badges + LOCAL queries visible..."
osascript -e "tell application \"System Events\" to set frontmost of (first process whose unix id is $CHROME_PID) to true" 2>/dev/null || true
sleep 3
# --------------- Scene 5: Check Domain blocker (14-17s) ---------------
log "Scene 5: Check Domain — blocked tracker..."
# --------------- Scene 6: Check Domain blocker (18-21s) ---------------
log "Scene 6: Check Domain — blocked tracker..."
# Scroll down to blocking panel
run_js "
var blockPanel = document.getElementById('blockingPanel');
@@ -273,8 +282,8 @@ sleep 0.3
run_js "document.querySelector('#checkDomainInput').closest('form').querySelector('.btn').click();"
sleep 2
# --------------- Scene 6: Terminal-style dig overlay (17-20s) ---------------
log "Scene 6: dig proof overlay..."
# --------------- Scene 7: Terminal-style dig overlay (21-24s) ---------------
log "Scene 7: dig proof overlay..."
DIG_RESULT=$(dig @127.0.0.1 peekm.numa +short 2>/dev/null | head -1)
run_js "
var overlay = document.createElement('div');

View File

@@ -37,7 +37,7 @@ cargo update --workspace
git add Cargo.toml Cargo.lock
git commit -m "chore: bump version to $VERSION"
git tag "$TAG"
git push origin main --tags
git push origin main "$TAG"
echo
echo "Released $TAG — GitHub Actions will build, publish to crates.io, and create the release."

14
scripts/serve-site.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/usr/bin/env bash
set -euo pipefail
PORT="${1:-9000}"
if [[ "${1:-}" == "--drafts" ]] || [[ "${2:-}" == "--drafts" ]]; then
PORT="${PORT//--drafts/9000}" # default port if --drafts was first arg
make blog-drafts
else
make blog
fi
echo "Serving site at http://localhost:$PORT"
cd site && python3 -m http.server "$PORT"

View File

@@ -74,6 +74,7 @@ body::before {
font-weight: 400;
color: var(--text-primary);
text-decoration: none;
text-transform: none;
letter-spacing: -0.02em;
}
.blog-nav .wordmark:hover { color: var(--amber); }
@@ -266,9 +267,105 @@ body::before {
.blog-footer a:hover { color: var(--amber); }
/* --- Responsive --- */
/* Hero metrics cards */
.hero-metrics {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 1rem;
margin: 2rem 0;
}
.metric-card {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 6px;
padding: 1.25rem;
text-align: center;
}
.metric-vs {
font-family: var(--font-mono);
font-size: 0.7rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
margin-bottom: 0.5rem;
}
.metric-value {
font-family: var(--font-display);
font-size: 2.4rem;
font-weight: 400;
color: var(--amber);
line-height: 1.1;
}
.metric-label {
font-size: 0.82rem;
color: var(--text-secondary);
margin-top: 0.5rem;
line-height: 1.3;
}
/* Before/after progression */
.before-after {
display: flex;
align-items: center;
justify-content: center;
gap: 1.5rem;
margin: 2rem 0;
padding: 1.5rem;
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 6px;
}
.ba-item { text-align: center; }
.ba-label {
font-family: var(--font-mono);
font-size: 0.7rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
margin-bottom: 0.3rem;
}
.ba-value {
font-family: var(--font-display);
font-size: 1.8rem;
font-weight: 400;
color: var(--text-secondary);
}
.ba-before {
text-decoration: line-through;
text-decoration-color: rgba(192, 98, 58, 0.4);
color: var(--text-dim);
}
.ba-after { color: var(--amber); }
.ba-arrow { font-size: 1.5rem; color: var(--text-dim); }
.ba-ref {
border-left: 1px solid var(--border);
padding-left: 1.5rem;
}
/* Spike highlight */
.spike {
background: rgba(192, 98, 58, 0.12);
padding: 0.15em 0.5em;
border-radius: 3px;
font-weight: 600;
color: var(--amber-dim);
}
/* Section dividers */
.article hr {
border: none;
height: 1px;
background: var(--border);
margin: 3rem auto;
max-width: 120px;
}
@media (max-width: 640px) {
.article { padding: 2rem 1.25rem 4rem; }
.article pre { padding: 1rem; margin-left: -0.5rem; margin-right: -0.5rem; border-radius: 0; border-left: none; border-right: none; }
.hero-metrics { grid-template-columns: 1fr; }
.before-after { flex-direction: column; gap: 0.75rem; }
.ba-ref { border-left: none; border-top: 1px solid var(--border); padding-left: 0; padding-top: 0.75rem; }
}
</style>
</head>

129
site/blog/dot-handshake.svg Normal file
View File

@@ -0,0 +1,129 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 360" font-family="'DM Sans', system-ui, sans-serif" font-size="12">
<defs>
<marker id="arr-amber" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#c0623a"/>
</marker>
<marker id="arr-dim" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#a39888"/>
</marker>
<filter id="shadow" x="-3%" y="-3%" width="106%" height="106%">
<feDropShadow dx="0" dy="1" stdDeviation="2" flood-opacity="0.06"/>
</filter>
</defs>
<!-- Background -->
<rect width="720" height="360" rx="8" fill="#faf7f2"/>
<!-- Title -->
<text x="360" y="32" text-anchor="middle" font-size="15" font-weight="600" fill="#2c2418" font-family="'Instrument Serif', Georgia, serif" letter-spacing="-0.02em">UDP vs DoT — one lookup, three scenarios</text>
<text x="360" y="50" text-anchor="middle" font-size="11" fill="#a39888">Time flows downward. Amber = DNS work. Gray = TCP/TLS handshake overhead.</text>
<!-- ==================== Column 1: Plain UDP ==================== -->
<g transform="translate(20, 0)">
<!-- Column header -->
<text x="90" y="84" text-anchor="middle" font-size="13" font-weight="600" fill="#2c2418">Plain UDP DNS</text>
<text x="90" y="101" text-anchor="middle" font-size="10" fill="#a39888" letter-spacing="0.06em">PORT 53 · CLEARTEXT</text>
<!-- Lane labels -->
<text x="25" y="128" font-size="10" fill="#6b5e4f">client</text>
<text x="133" y="128" font-size="10" fill="#6b5e4f">server</text>
<!-- Lanes -->
<line x1="35" y1="138" x2="35" y2="198" stroke="#d4cbba" stroke-width="1" stroke-dasharray="2 3"/>
<line x1="145" y1="138" x2="145" y2="198" stroke="#d4cbba" stroke-width="1" stroke-dasharray="2 3"/>
<!-- query -->
<line x1="37" y1="148" x2="143" y2="160" stroke="#c0623a" stroke-width="2" marker-end="url(#arr-amber)"/>
<text x="90" y="143" text-anchor="middle" font-size="10" fill="#9e4e2d" font-weight="500">query</text>
<!-- response -->
<line x1="143" y1="178" x2="37" y2="190" stroke="#c0623a" stroke-width="2" marker-end="url(#arr-amber)"/>
<text x="90" y="205" text-anchor="middle" font-size="10" fill="#9e4e2d" font-weight="500">response</text>
<!-- Total cost badge -->
<rect x="20" y="225" width="140" height="32" rx="4" fill="#faf7f2" stroke="#d4cbba" stroke-width="1" filter="url(#shadow)"/>
<text x="90" y="241" text-anchor="middle" font-size="9" fill="#a39888" letter-spacing="0.04em">TOTAL LATENCY</text>
<text x="90" y="253" text-anchor="middle" font-size="11" font-weight="600" fill="#c0623a" font-family="'JetBrains Mono', monospace">1 × RTT</text>
</g>
<!-- ==================== Column 2: DoT cold ==================== -->
<g transform="translate(270, 0)">
<!-- Column header -->
<text x="90" y="84" text-anchor="middle" font-size="13" font-weight="600" fill="#2c2418">DoT — first query</text>
<text x="90" y="101" text-anchor="middle" font-size="10" fill="#a39888" letter-spacing="0.06em">PORT 853 · NEW CONNECTION</text>
<!-- Lane labels -->
<text x="25" y="128" font-size="10" fill="#6b5e4f">client</text>
<text x="133" y="128" font-size="10" fill="#6b5e4f">server</text>
<!-- Lanes -->
<line x1="35" y1="138" x2="35" y2="308" stroke="#d4cbba" stroke-width="1" stroke-dasharray="2 3"/>
<line x1="145" y1="138" x2="145" y2="308" stroke="#d4cbba" stroke-width="1" stroke-dasharray="2 3"/>
<!-- === RTT 1: TCP handshake === -->
<!-- SYN -->
<line x1="37" y1="145" x2="143" y2="153" stroke="#a39888" stroke-width="1.5" marker-end="url(#arr-dim)"/>
<!-- SYN-ACK -->
<line x1="143" y1="163" x2="37" y2="171" stroke="#a39888" stroke-width="1.5" marker-end="url(#arr-dim)"/>
<!-- ACK -->
<line x1="37" y1="181" x2="143" y2="189" stroke="#a39888" stroke-width="1.5" marker-end="url(#arr-dim)"/>
<!-- Label + RTT marker -->
<text x="168" y="170" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">1 rtt</text>
<text x="90" y="143" text-anchor="middle" font-size="9" fill="#6b5e4f" font-style="italic">TCP handshake</text>
<!-- === RTT 2: TLS 1.3 handshake === -->
<!-- ClientHello -->
<line x1="37" y1="208" x2="143" y2="216" stroke="#a39888" stroke-width="1.5" marker-end="url(#arr-dim)"/>
<!-- ServerHello + Cert + Finished -->
<line x1="143" y1="226" x2="37" y2="234" stroke="#a39888" stroke-width="1.5" marker-end="url(#arr-dim)"/>
<!-- Label + RTT marker -->
<text x="168" y="222" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">2 rtt</text>
<text x="90" y="205" text-anchor="middle" font-size="9" fill="#6b5e4f" font-style="italic">TLS 1.3 handshake</text>
<!-- === RTT 3: DNS exchange === -->
<!-- query (piggybacked on ClientFinished) -->
<line x1="37" y1="253" x2="143" y2="261" stroke="#c0623a" stroke-width="2" marker-end="url(#arr-amber)"/>
<!-- response -->
<line x1="143" y1="271" x2="37" y2="279" stroke="#c0623a" stroke-width="2" marker-end="url(#arr-amber)"/>
<!-- Label + RTT marker -->
<text x="168" y="267" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">3 rtt</text>
<text x="90" y="250" text-anchor="middle" font-size="10" fill="#9e4e2d" font-weight="500">query + response</text>
<!-- Total cost badge -->
<rect x="20" y="295" width="140" height="32" rx="4" fill="#faf7f2" stroke="#d4cbba" stroke-width="1" filter="url(#shadow)"/>
<text x="90" y="311" text-anchor="middle" font-size="9" fill="#a39888" letter-spacing="0.04em">TOTAL LATENCY</text>
<text x="90" y="323" text-anchor="middle" font-size="11" font-weight="600" fill="#c0623a" font-family="'JetBrains Mono', monospace">3 × RTT</text>
</g>
<!-- ==================== Column 3: DoT reused ==================== -->
<g transform="translate(520, 0)">
<!-- Column header -->
<text x="90" y="84" text-anchor="middle" font-size="13" font-weight="600" fill="#2c2418">DoT — reused session</text>
<text x="90" y="101" text-anchor="middle" font-size="10" fill="#a39888" letter-spacing="0.06em">PORT 853 · PERSISTENT TCP/TLS</text>
<!-- Lane labels -->
<text x="25" y="128" font-size="10" fill="#6b5e4f">client</text>
<text x="133" y="128" font-size="10" fill="#6b5e4f">server</text>
<!-- Lanes -->
<line x1="35" y1="138" x2="35" y2="198" stroke="#d4cbba" stroke-width="1" stroke-dasharray="2 3"/>
<line x1="145" y1="138" x2="145" y2="198" stroke="#d4cbba" stroke-width="1" stroke-dasharray="2 3"/>
<!-- query -->
<line x1="37" y1="148" x2="143" y2="160" stroke="#c0623a" stroke-width="2" marker-end="url(#arr-amber)"/>
<text x="90" y="143" text-anchor="middle" font-size="10" fill="#9e4e2d" font-weight="500">query</text>
<!-- response -->
<line x1="143" y1="178" x2="37" y2="190" stroke="#c0623a" stroke-width="2" marker-end="url(#arr-amber)"/>
<text x="90" y="205" text-anchor="middle" font-size="10" fill="#9e4e2d" font-weight="500">response</text>
<!-- Total cost badge -->
<rect x="20" y="225" width="140" height="32" rx="4" fill="#faf7f2" stroke="#d4cbba" stroke-width="1" filter="url(#shadow)"/>
<text x="90" y="241" text-anchor="middle" font-size="9" fill="#a39888" letter-spacing="0.04em">TOTAL LATENCY</text>
<text x="90" y="253" text-anchor="middle" font-size="11" font-weight="600" fill="#c0623a" font-family="'JetBrains Mono', monospace">1 × RTT</text>
<!-- Tiny caption -->
<text x="90" y="280" text-anchor="middle" font-size="9" fill="#a39888" font-style="italic">(handshake amortized</text>
<text x="90" y="292" text-anchor="middle" font-size="9" fill="#a39888" font-style="italic">across the session)</text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 7.7 KiB

View File

@@ -0,0 +1,92 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 330" font-family="'DM Sans', system-ui, sans-serif" font-size="12">
<defs>
<filter id="shadow" x="-3%" y="-3%" width="106%" height="106%">
<feDropShadow dx="0" dy="1" stdDeviation="2" flood-opacity="0.06"/>
</filter>
<!-- Diagonal hatch for "wasted" UDP timeout regions. Darker warm gray
base + slightly darker diagonal stripes at 45°. The stripe pattern
is the Gantt convention for "dead/blocked time" — it reads as
"this time was thrown away" without needing the legend. -->
<pattern id="wasted-hatch" patternUnits="userSpaceOnUse" width="7" height="7" patternTransform="rotate(-45)">
<rect width="7" height="7" fill="#8b7f6f"/>
<line x1="0" y1="0" x2="0" y2="7" stroke="#3d3427" stroke-width="1.6" opacity="0.38"/>
</pattern>
</defs>
<!-- Background -->
<rect width="720" height="330" rx="8" fill="#faf7f2"/>
<!-- Title -->
<text x="360" y="32" text-anchor="middle" font-size="15" font-weight="600" fill="#2c2418" font-family="'Instrument Serif', Georgia, serif" letter-spacing="-0.02em">TCP fallback with UDP auto-disable</text>
<text x="360" y="50" text-anchor="middle" font-size="11" fill="#a39888">Latency profile on an ISP that blocks outbound UDP:53</text>
<!-- Legend -->
<g transform="translate(160, 70)">
<rect width="14" height="12" rx="2" fill="url(#wasted-hatch)"/>
<text x="22" y="10" font-size="11" fill="#6b5e4f">UDP timeout — 800 ms wasted</text>
<rect x="220" width="14" height="12" rx="2" fill="#c0623a"/>
<text x="242" y="10" font-size="11" fill="#6b5e4f">TCP — successful exchange</text>
</g>
<!-- Time axis -->
<!-- bar area: x=90 to x=570 (480px), representing 0-1200ms, scale 0.4 px/ms -->
<line x1="90" y1="108" x2="570" y2="108" stroke="#d4cbba" stroke-width="1"/>
<!-- tick marks -->
<line x1="90" y1="106" x2="90" y2="112" stroke="#a39888" stroke-width="1"/>
<line x1="210" y1="106" x2="210" y2="112" stroke="#a39888" stroke-width="1"/>
<line x1="330" y1="106" x2="330" y2="112" stroke="#a39888" stroke-width="1"/>
<line x1="410" y1="106" x2="410" y2="112" stroke="#a39888" stroke-width="1"/>
<line x1="530" y1="106" x2="530" y2="112" stroke="#a39888" stroke-width="1"/>
<!-- tick labels -->
<text x="90" y="102" text-anchor="middle" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">0</text>
<text x="210" y="102" text-anchor="middle" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">300</text>
<text x="330" y="102" text-anchor="middle" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">600</text>
<text x="410" y="102" text-anchor="middle" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">800</text>
<text x="530" y="102" text-anchor="middle" font-size="9" fill="#a39888" font-family="'JetBrains Mono', monospace">1100 ms</text>
<!-- ============ Phase 1: UDP-first (wasted 800ms per query) ============ -->
<!-- Query 1 -->
<text x="82" y="135" text-anchor="end" font-size="11" fill="#6b5e4f">query 1</text>
<rect x="90" y="125" width="320" height="16" rx="2" fill="url(#wasted-hatch)"/>
<rect x="410" y="125" width="120" height="16" rx="2" fill="#c0623a"/>
<text x="540" y="137" font-size="10" fill="#6b5e4f" font-family="'JetBrains Mono', monospace">1,100 ms</text>
<!-- Query 2 -->
<text x="82" y="159" text-anchor="end" font-size="11" fill="#6b5e4f">query 2</text>
<rect x="90" y="149" width="320" height="16" rx="2" fill="url(#wasted-hatch)"/>
<rect x="410" y="149" width="120" height="16" rx="2" fill="#c0623a"/>
<text x="540" y="161" font-size="10" fill="#6b5e4f" font-family="'JetBrains Mono', monospace">1,100 ms</text>
<!-- Query 3 -->
<text x="82" y="183" text-anchor="end" font-size="11" fill="#6b5e4f">query 3</text>
<rect x="90" y="173" width="320" height="16" rx="2" fill="url(#wasted-hatch)"/>
<rect x="410" y="173" width="120" height="16" rx="2" fill="#c0623a"/>
<text x="540" y="185" font-size="10" fill="#6b5e4f" font-family="'JetBrains Mono', monospace">1,100 ms</text>
<!-- State-change divider -->
<line x1="90" y1="206" x2="570" y2="206" stroke="#6b7c4e" stroke-width="1" stroke-dasharray="4 3"/>
<rect x="200" y="198" width="260" height="18" rx="9" fill="#faf7f2" stroke="#6b7c4e" stroke-width="1" filter="url(#shadow)"/>
<text x="330" y="210" text-anchor="middle" font-size="10" fill="#566540" font-weight="500">3 consecutive failures → UDP auto-disabled</text>
<!-- ============ Phase 2: TCP-first (UDP skipped) ============ -->
<!-- Query 4 -->
<text x="82" y="235" text-anchor="end" font-size="11" fill="#6b5e4f">query 4</text>
<rect x="90" y="225" width="120" height="16" rx="2" fill="#c0623a"/>
<text x="220" y="237" font-size="10" fill="#6b5e4f" font-family="'JetBrains Mono', monospace">300 ms</text>
<!-- Query 5 -->
<text x="82" y="259" text-anchor="end" font-size="11" fill="#6b5e4f">query 5</text>
<rect x="90" y="249" width="120" height="16" rx="2" fill="#c0623a"/>
<text x="220" y="261" font-size="10" fill="#6b5e4f" font-family="'JetBrains Mono', monospace">300 ms</text>
<!-- Speedup callout -->
<g transform="translate(300, 246)">
<line x1="0" y1="-10" x2="0" y2="22" stroke="#6b7c4e" stroke-width="1" stroke-dasharray="2 2"/>
<text x="10" y="6" font-size="10" fill="#566540" font-style="italic">3.7× faster — no more UDP wait</text>
</g>
<!-- Footer caption -->
<text x="360" y="298" text-anchor="middle" font-size="10" fill="#a39888" font-style="italic">The flag resets on network change (LAN IP delta). Switch back to a clean network and UDP is tried again.</text>
</svg>

After

Width:  |  Height:  |  Size: 5.6 KiB

View File

@@ -67,6 +67,7 @@ body::before {
font-weight: 400;
color: var(--text-primary);
text-decoration: none;
text-transform: none;
letter-spacing: -0.02em;
}
.blog-nav .wordmark:hover { color: var(--amber); }
@@ -167,6 +168,20 @@ body::before {
<main class="blog-index">
<h1>Blog</h1>
<ul class="post-list">
<li>
<a href="/blog/posts/fixing-doh-tail-latency.html">
<div class="post-title">Fixing DNS tail latency with a 5-line config and a 50-line function</div>
<div class="post-desc">Periodic 40-140ms DoH spikes from hyper's dispatch channel. The fix was reqwest window tuning and request hedging — Dean & Barroso's "The Tail at Scale," applied to a DNS forwarder. Same ideas took cold recursive p99 from 2.3 seconds to 538ms.</div>
<div class="post-date">April 2026</div>
</a>
</li>
<li>
<a href="/blog/posts/dot-from-scratch.html">
<div class="post-title">DNS-over-TLS from Scratch in Rust</div>
<div class="post-desc">Building RFC 7858 on top of rustls — length-prefix framing, ALPN cross-protocol defense, and two bugs that only the strict clients caught.</div>
<div class="post-date">April 2026</div>
</a>
</li>
<li>
<a href="/blog/posts/dnssec-from-scratch.html">
<div class="post-title">Implementing DNSSEC from Scratch in Rust</div>
@@ -177,7 +192,7 @@ body::before {
<li>
<a href="/blog/posts/dns-from-scratch.html">
<div class="post-title">I Built a DNS Resolver from Scratch in Rust</div>
<div class="post-desc">How DNS actually works at the wire level — label compression, TTL tricks, DoH implementation, and what I learned building a resolver with zero DNS libraries.</div>
<div class="post-desc">How DNS actually works at the wire level — label compression, TTL tricks, DoH, and what surprised me building a resolver with zero DNS libraries.</div>
<div class="post-date">March 2026</div>
</a>
</li>

Binary file not shown.

After

Width:  |  Height:  |  Size: 310 KiB

View File

@@ -217,12 +217,18 @@ body {
min-width: 2px;
}
.path-bar-fill.forward { background: var(--amber); }
.path-bar-fill.upstream { background: var(--amber-dim); }
.path-bar-fill.recursive { background: var(--cyan); }
.path-bar-fill.cached { background: var(--teal); }
.path-bar-fill.local { background: var(--violet); }
.path-bar-fill.override { background: var(--emerald); }
.path-bar-fill.error { background: var(--rose); }
.path-bar-fill.blocked { background: var(--text-dim); }
.path-bar-fill.udp { background: var(--text-dim); }
.path-bar-fill.tcp { background: var(--violet); }
.path-bar-fill.dot { background: var(--emerald); }
.path-bar-fill.doh { background: var(--teal); }
.path-bar-fill.odoh { background: var(--violet-dim); }
.path-pct {
font-family: var(--font-mono);
font-size: 0.75rem;
@@ -281,6 +287,7 @@ body {
font-weight: 500;
}
.path-tag.FORWARD { background: rgba(192, 98, 58, 0.12); color: var(--amber-dim); }
.path-tag.UPSTREAM { background: rgba(160, 120, 72, 0.12); color: var(--amber-dim); }
.path-tag.RECURSIVE { background: rgba(74, 124, 138, 0.12); color: var(--cyan); }
.path-tag.CACHED { background: rgba(107, 124, 78, 0.12); color: var(--teal-dim); }
.path-tag.LOCAL { background: rgba(100, 116, 139, 0.12); color: var(--violet-dim); }
@@ -288,6 +295,11 @@ body {
.path-tag.SERVFAIL { background: rgba(181, 68, 58, 0.12); color: var(--rose); }
.path-tag.BLOCKED { background: rgba(163, 152, 136, 0.15); color: var(--text-dim); }
.path-tag.COALESCED { background: rgba(138, 104, 158, 0.12); color: var(--violet-dim); }
.path-tag.UDP { background: rgba(163, 152, 136, 0.15); color: var(--text-dim); }
.path-tag.TCP { background: rgba(100, 116, 139, 0.12); color: var(--violet-dim); }
.path-tag.DOT { background: rgba(82, 122, 82, 0.12); color: var(--emerald); }
.path-tag.DOH { background: rgba(107, 124, 78, 0.12); color: var(--teal); }
.src-tag { font-size: 0.6rem; color: var(--text-dim); letter-spacing: 0.02em; }
/* Sidebar panels */
.sidebar {
@@ -541,7 +553,11 @@ body {
@media (max-width: 700px) {
.stats-row { grid-template-columns: repeat(2, 1fr); }
.dashboard { padding: 1rem; }
.header { padding: 1rem; }
.header { padding: 0.8rem 1rem; }
.logo { font-size: 1.4rem; }
.tagline { display: none; }
#headerVersion { display: none; }
#phoneSetup { display: none; }
}
</style>
</head>
@@ -550,9 +566,24 @@ body {
<div class="header">
<div class="header-left">
<div class="logo">Numa</div>
<span id="headerVersion" style="font-family:var(--font-mono);font-size:0.68rem;color:var(--text-dim);"></span>
<div class="tagline">DNS that governs itself</div>
</div>
<div style="display:flex;align-items:center;gap:1.2rem;">
<div id="phoneSetup" style="position:relative;display:none;">
<button class="btn" onclick="togglePhoneSetup()" style="background:var(--bg-surface);color:var(--text-secondary);font-family:var(--font-mono);font-size:0.7rem;padding:0.35rem 0.6rem;border:1px solid var(--border);" title="Set up phone">Phone Setup</button>
<div id="phoneSetupPopover" style="display:none;position:absolute;top:calc(100% + 8px);right:0;z-index:100;background:var(--bg-card);border:1px solid var(--border);border-radius:10px;padding:1.2rem;width:260px;box-shadow:0 4px 20px rgba(0,0,0,0.08);">
<div style="font-size:0.7rem;font-weight:600;text-transform:uppercase;letter-spacing:0.1em;color:var(--text-secondary);margin-bottom:0.8rem;">Phone Setup</div>
<div id="qrContainer" style="display:flex;justify-content:center;margin-bottom:0.8rem;"></div>
<div id="phoneSetupLink" style="display:none;text-align:center;margin-bottom:0.8rem;"></div>
<div style="font-family:var(--font-mono);font-size:0.62rem;color:var(--text-dim);line-height:1.6;">
1. Scan QR &rarr; allow download<br>
2. Settings &rarr; Profile Downloaded &rarr; Install<br>
3. Settings &rarr; General &rarr; About &rarr;<br>
&nbsp;&nbsp;&nbsp;Certificate Trust Settings &rarr; toggle ON
</div>
</div>
</div>
<button class="btn" id="pauseBtn" style="background:var(--amber);color:white;font-family:var(--font-mono);font-size:0.7rem;display:none;">Pause 5m</button>
<button class="btn" id="toggleBtn" onclick="toggleBlocking()" style="background:var(--rose);color:white;font-family:var(--font-mono);font-size:0.7rem;display:none;"></button>
<div class="status-badge">
@@ -607,6 +638,26 @@ body {
</div>
</div>
<!-- Inbound wire (apps → numa) -->
<div class="panel">
<div class="panel-header">
<span class="panel-title">Inbound Wire <span style="color: var(--text-dim); font-weight: normal;">apps → numa</span></span>
<span class="panel-title" id="transportEncrypted" style="color: var(--text-dim)"></span>
</div>
<div class="panel-body" id="transportBars">
</div>
</div>
<!-- Outbound wire (numa → internet) -->
<div class="panel">
<div class="panel-header">
<span class="panel-title">Outbound Wire <span style="color: var(--text-dim); font-weight: normal;">numa → internet</span></span>
<span class="panel-title" id="upstreamWireEncrypted" style="color: var(--text-dim)"></span>
</div>
<div class="panel-body" id="upstreamWireBars">
</div>
</div>
<!-- Main grid: query log + sidebar -->
<div class="main-grid">
<!-- Query log -->
@@ -622,12 +673,21 @@ body {
<option value="RECURSIVE">recursive</option>
<option value="COALESCED">coalesced</option>
<option value="FORWARD">forward</option>
<option value="UPSTREAM">upstream</option>
<option value="CACHED">cached</option>
<option value="BLOCKED">blocked</option>
<option value="OVERRIDE">override</option>
<option value="LOCAL">local</option>
<option value="SERVFAIL">error</option>
</select>
<select id="logFilterTransport" onchange="applyLogFilter()"
style="font-family:var(--font-mono);font-size:0.7rem;padding:0.25rem 0.4rem;border:1px solid var(--border);border-radius:4px;background:var(--bg-surface);color:var(--text-secondary);outline:none;">
<option value="">all transports</option>
<option value="UDP">UDP</option>
<option value="TCP">TCP</option>
<option value="DOT">DoT</option>
<option value="DOH">DoH</option>
</select>
<span class="panel-title" id="queryCount" style="color: var(--text-dim)"></span>
</div>
</div>
@@ -639,6 +699,7 @@ body {
<th>Type</th>
<th>Domain</th>
<th>Path</th>
<th>Transport</th>
<th>Result</th>
<th>Latency</th>
</tr>
@@ -787,6 +848,41 @@ function formatTime(epoch) {
return d.toLocaleTimeString([], { hour12: false });
}
let mobilePort = 8765;
function togglePhoneSetup() {
const pop = document.getElementById('phoneSetupPopover');
const isOpen = pop.style.display !== 'none';
pop.style.display = isOpen ? 'none' : 'block';
if (!isOpen) {
if (window.innerWidth <= 700) {
document.getElementById('qrContainer').style.display = 'none';
const linkEl = document.getElementById('phoneSetupLink');
const host = window.location.hostname;
linkEl.style.display = 'block';
linkEl.innerHTML = `<a href="http://${host}:${mobilePort}/mobileconfig" style="display:inline-block;padding:0.5rem 1rem;background:var(--amber);color:white;border-radius:6px;text-decoration:none;font-family:var(--font-mono);font-size:0.75rem;">Install Profile</a>`;
} else {
fetch(API + '/qr').then(r => r.text()).then(svg => {
document.getElementById('qrContainer').innerHTML = svg;
}).catch(() => {
document.getElementById('qrContainer').innerHTML = '<div class="empty-state">Could not load QR</div>';
});
}
}
}
document.addEventListener('click', (e) => {
const setup = document.getElementById('phoneSetup');
if (setup && !setup.contains(e.target)) {
document.getElementById('phoneSetupPopover').style.display = 'none';
}
});
function shortSrc(addr) {
if (!addr) return '';
const ip = addr.replace(/:\d+$/, '');
if (ip === '127.0.0.1' || ip === '::1') return 'localhost';
return ip;
}
function formatRemaining(secs) {
if (secs == null) return 'permanent';
if (secs < 60) return `${secs}s left`;
@@ -857,8 +953,34 @@ function renderMemory(mem, stats) {
`;
}
function renderBarChart(containerId, defs, data, total) {
total = total || 1;
document.getElementById(containerId).innerHTML = defs
.filter(d => (data[d.key] || 0) > 0)
.map(d => {
const count = data[d.key] || 0;
const pct = ((count / total) * 100).toFixed(1);
return `
<div class="path-bar-row">
<span class="path-label">${d.label}</span>
<div class="path-bar-track">
<div class="path-bar-fill ${d.cls}" style="width: ${pct}%"></div>
</div>
<span class="path-pct">${pct}%</span>
</div>`;
}).join('');
}
function encryptionPct(data, encryptedKeys, allKeys) {
const total = allKeys.reduce((s, k) => s + (data[k] || 0), 0);
if (total === 0) return 0;
const encrypted = encryptedKeys.reduce((s, k) => s + (data[k] || 0), 0);
return Math.round((encrypted / total) * 100);
}
const PATH_DEFS = [
{ key: 'forwarded', label: 'Forward', cls: 'forward' },
{ key: 'upstream', label: 'Upstream', cls: 'upstream' },
{ key: 'recursive', label: 'Recursive', cls: 'recursive' },
{ key: 'cached', label: 'Cached', cls: 'cached' },
{ key: 'local', label: 'Local', cls: 'local' },
@@ -868,20 +990,39 @@ const PATH_DEFS = [
];
function renderPaths(queries) {
const total = queries.total || 1;
const container = document.getElementById('pathBars');
container.innerHTML = PATH_DEFS.map(p => {
const count = queries[p.key] || 0;
const pct = ((count / total) * 100).toFixed(1);
return `
<div class="path-bar-row">
<span class="path-label">${p.label}</span>
<div class="path-bar-track">
<div class="path-bar-fill ${p.cls}" style="width: ${pct}%"></div>
</div>
<span class="path-pct">${pct}%</span>
</div>`;
}).join('');
renderBarChart('pathBars', PATH_DEFS, queries, queries.total);
}
const TRANSPORT_DEFS = [
{ key: 'udp', label: 'UDP', cls: 'udp' },
{ key: 'tcp', label: 'TCP', cls: 'tcp' },
{ key: 'dot', label: 'DoT', cls: 'dot' },
{ key: 'doh', label: 'DoH', cls: 'doh' },
];
function renderTransport(transport) {
const total = (transport.udp + transport.tcp + transport.dot + transport.doh) || 1;
renderBarChart('transportBars', TRANSPORT_DEFS, transport, total);
const encPct = encryptionPct(transport, ['dot', 'doh'], ['udp', 'tcp', 'dot', 'doh']);
const el = document.getElementById('transportEncrypted');
el.textContent = `${encPct}% encrypted inbound`;
el.style.color = encPct >= 80 ? 'var(--emerald)' : encPct >= 50 ? 'var(--amber)' : 'var(--rose)';
}
const UPSTREAM_WIRE_DEFS = [
{ key: 'udp', label: 'UDP', cls: 'udp' },
{ key: 'doh', label: 'DoH', cls: 'doh' },
{ key: 'dot', label: 'DoT', cls: 'dot' },
{ key: 'odoh', label: 'ODoH', cls: 'odoh' },
];
function renderUpstreamWire(ut) {
const total = (ut.udp + ut.doh + ut.dot + ut.odoh) || 0;
renderBarChart('upstreamWireBars', UPSTREAM_WIRE_DEFS, ut, total || 1);
const encPct = encryptionPct(ut, ['doh', 'dot', 'odoh'], ['udp', 'doh', 'dot', 'odoh']);
const el = document.getElementById('upstreamWireEncrypted');
el.textContent = total > 0 ? `${encPct}% encrypted outbound` : '';
el.style.color = encPct >= 80 ? 'var(--emerald)' : encPct >= 50 ? 'var(--amber)' : 'var(--rose)';
}
function renderQueryLog(entries) {
@@ -892,6 +1033,7 @@ function renderQueryLog(entries) {
function applyLogFilter() {
const domainFilter = document.getElementById('logFilterDomain').value.trim().toLowerCase();
const pathFilter = document.getElementById('logFilterPath').value;
const transportFilter = document.getElementById('logFilterTransport').value;
let filtered = lastLogEntries;
if (domainFilter) {
@@ -900,6 +1042,9 @@ function applyLogFilter() {
if (pathFilter) {
filtered = filtered.filter(e => e.path === pathFilter);
}
if (transportFilter) {
filtered = filtered.filter(e => e.transport === transportFilter);
}
const tbody = document.getElementById('queryLogBody');
document.getElementById('queryCount').textContent =
@@ -912,11 +1057,12 @@ function applyLogFilter() {
? ` <button class="btn-delete" onclick="allowDomain('${e.domain}')" title="Allow this domain" style="color:var(--emerald);font-size:0.65rem;">allow</button>`
: '';
return `
<tr>
<td>${formatTime(e.timestamp_epoch)}</td>
<tr title="Source: ${e.src || 'unknown'}">
<td>${formatTime(e.timestamp_epoch)}<br><span class="src-tag">${shortSrc(e.src)}</span></td>
<td>${e.query_type}</td>
<td class="domain-cell" title="${e.domain}">${e.domain}${allowBtn}</td>
<td><span class="path-tag ${e.path}">${e.path}</span></td>
<td><span class="path-tag ${e.transport}">${e.transport}</span></td>
<td style="white-space:nowrap;"><span style="display:inline-block;width:15px;text-align:center;">${e.dnssec === 'secure' ? '<svg title="DNSSEC verified" width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="var(--emerald)" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-1px;"><path d="M12 22s8-4 8-10V5l-8-3-8 3v7c0 6 8 10 8 10z"/><path d="m9 12 2 2 4-4"/></svg>' : ''}</span>${e.rescode}</td>
<td>${e.latency_ms.toFixed(1)}ms</td>
</tr>`;
@@ -1024,16 +1170,23 @@ async function refresh() {
document.getElementById('totalQueries').textContent = formatNumber(q.total);
document.getElementById('uptime').textContent = formatUptime(stats.uptime_secs);
document.getElementById('uptimeSub').textContent = formatUptimeSub(stats.uptime_secs);
document.getElementById('headerVersion').textContent = stats.version ? 'v' + stats.version : '';
document.getElementById('footerUpstream').textContent = stats.upstream || '';
document.getElementById('footerConfig').textContent = stats.config_path || '';
document.getElementById('footerData').textContent = stats.data_dir || '';
const modeEl = document.getElementById('footerMode');
modeEl.textContent = stats.mode || '—';
modeEl.style.color = stats.mode === 'recursive' ? 'var(--emerald)' : 'var(--amber)';
document.getElementById('footerDnssec').textContent = stats.dnssec ? 'on' : 'off';
document.getElementById('footerDnssec').style.color = stats.dnssec ? 'var(--emerald)' : 'var(--text-dim)';
document.getElementById('footerSrtt').textContent = stats.srtt ? 'on' : 'off';
document.getElementById('footerSrtt').style.color = stats.srtt ? 'var(--emerald)' : 'var(--text-dim)';
if (!document.getElementById('footerLogs').textContent) {
const isWin = stats.data_dir && stats.data_dir.includes(':\\');
const isMac = stats.data_dir && stats.data_dir.includes('/usr/local/');
const logsEl = document.getElementById('footerLogs');
logsEl.textContent = isWin
? stats.data_dir + '\\numa.log'
: isMac ? '/usr/local/var/log/numa.log'
: 'journalctl -u numa -f';
}
// LAN status indicator
const lanEl = document.getElementById('lanToggle');
@@ -1050,6 +1203,14 @@ async function refresh() {
}
}
const phoneSetupEl = document.getElementById('phoneSetup');
if (stats.mobile && stats.mobile.enabled) {
phoneSetupEl.style.display = '';
mobilePort = stats.mobile.port;
} else {
phoneSetupEl.style.display = 'none';
}
document.getElementById('overrideCount').textContent = stats.overrides.active;
document.getElementById('blockedCount').textContent = formatNumber(q.blocked);
const bl = stats.blocking;
@@ -1083,22 +1244,26 @@ async function refresh() {
// QPS calculation
const now = Date.now();
const encPct = encryptionPct(stats.transport, ['dot', 'doh'], ['udp', 'tcp', 'dot', 'doh']);
if (prevTotal !== null && prevTime !== null) {
const dt = (now - prevTime) / 1000;
const dq = q.total - prevTotal;
const qps = dt > 0 ? (dq / dt).toFixed(1) : '0.0';
document.getElementById('qps').textContent = `~${qps}/s`;
const encTag = q.total > 0 ? ` · ${encPct}% enc` : '';
document.getElementById('qps').textContent = `~${qps}/s${encTag}`;
}
prevTotal = q.total;
prevTime = now;
// Cache hit rate
const answered = q.cached + q.forwarded + q.recursive + q.coalesced + q.local + q.overridden;
const answered = q.cached + q.forwarded + q.upstream + q.recursive + q.coalesced + q.local + q.overridden;
const hitRate = answered > 0 ? ((q.cached / answered) * 100).toFixed(1) : '0.0';
document.getElementById('cacheRate').textContent = hitRate + '%';
// Panels
renderPaths(q);
renderTransport(stats.transport);
renderUpstreamWire(stats.upstream_transport || { udp: 0, doh: 0, dot: 0, odoh: 0 });
renderQueryLog(logs);
renderOverrides(overrides);
renderCache(cache);
@@ -1108,6 +1273,7 @@ async function refresh() {
renderMemory(stats.memory, stats);
} catch (err) {
console.error('[numa dashboard] render failed:', err);
document.getElementById('statusDot').className = 'status-dot error';
document.getElementById('statusText').textContent = 'disconnected';
}
@@ -1222,6 +1388,7 @@ function renderBlockingInfo(info) {
}
function renderAllowlist(entries) {
if (document.activeElement && document.activeElement.id === 'allowDomainInput') return;
const el = document.getElementById('blockingAllowlist');
const count = entries.length;
el.innerHTML = `
@@ -1381,14 +1548,14 @@ refresh();
setInterval(refresh, 2000);
</script>
<div style="text-align:center;padding:0.8rem;font-family:var(--font-mono);font-size:0.68rem;color:var(--text-dim);">
<div style="text-align:center;padding:0.8rem 0.8rem 0.4rem;font-family:var(--font-mono);font-size:0.68rem;color:var(--text-dim);line-height:1.8;">
Config: <span id="footerConfig" style="user-select:all;color:var(--emerald);"></span>
· Data: <span id="footerData" style="user-select:all;color:var(--emerald);"></span>
· Upstream: <span id="footerUpstream" style="user-select:all;color:var(--emerald);"></span>
· Mode: <span id="footerMode" style="color:var(--text-dim);"></span>
· Logs: <span id="footerLogs" style="user-select:all;color:var(--emerald);"></span>
<br>
Upstream: <span id="footerUpstream" style="user-select:all;color:var(--emerald);"></span>
· DNSSEC: <span id="footerDnssec" style="color:var(--text-dim);"></span>
· SRTT: <span id="footerSrtt" style="color:var(--text-dim);"></span>
· Logs: <span style="user-select:all;color:var(--emerald);">macOS: /usr/local/var/log/numa.log · Linux: journalctl -u numa -f</span>
· <a href="https://github.com/razvandimescu/numa" target="_blank" rel="noopener" style="color:var(--amber);text-decoration:none;">GitHub</a>
</div>

View File

@@ -188,11 +188,50 @@ p.lead {
line-height: 1.8;
}
/* ===========================
TOP NAV
=========================== */
.site-nav {
padding: 1.5rem 2rem;
display: flex;
align-items: center;
gap: 1.5rem;
position: relative;
z-index: 10;
}
.site-nav a {
font-family: var(--font-mono);
font-size: 0.75rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
text-decoration: none;
transition: color 0.2s ease;
}
.site-nav a:hover { color: var(--amber); }
.site-nav .wordmark {
font-family: var(--font-display);
font-size: 1.4rem;
font-weight: 400;
color: var(--text-primary);
text-transform: none;
letter-spacing: -0.02em;
}
.site-nav .wordmark:hover { color: var(--amber); }
.site-nav .sep {
color: var(--text-dim);
font-family: var(--font-mono);
font-size: 0.75rem;
}
/* ===========================
HERO
=========================== */
.hero {
min-height: 100vh;
min-height: calc(100vh - 5rem);
display: flex;
align-items: center;
position: relative;
@@ -1158,6 +1197,9 @@ footer .closing {
@media (max-width: 600px) {
section { padding: 4rem 0; }
.container { padding: 0 1.25rem; }
.site-nav { padding: 1rem 1.25rem; gap: 1rem; }
.site-nav .wordmark { font-size: 1.2rem; }
.hero { min-height: calc(100vh - 4rem); }
.network-grid { grid-template-columns: 1fr; }
.pipeline { flex-direction: column; align-items: stretch; gap: 0; }
.pipeline-arrow { transform: rotate(90deg); padding: 0.15rem 0; align-self: center; }
@@ -1171,6 +1213,14 @@ footer .closing {
</head>
<body>
<nav class="site-nav">
<a href="/" class="wordmark">Numa</a>
<span class="sep">/</span>
<a href="/blog/">Blog</a>
<span class="sep">/</span>
<a href="https://github.com/razvandimescu/numa" target="_blank" rel="noopener">GitHub</a>
</nav>
<!-- ==================== HERO ==================== -->
<section class="hero">
<div class="roman-bricks" aria-hidden="true"></div>
@@ -1243,6 +1293,8 @@ footer .closing {
<li>Ad &amp; tracker blocking &mdash; 385K+ domains, zero config</li>
<li>Recursive resolution &mdash; opt-in, resolve from root nameservers, no upstream needed</li>
<li>DNSSEC validation &mdash; chain-of-trust + NSEC/NSEC3 denial proofs (RSA, ECDSA, Ed25519)</li>
<li>DNS-over-TLS listener &mdash; encrypted DNS for phones and strict clients (RFC 7858 with ALPN defense)</li>
<li>Hostile-network resilience &mdash; TCP fallback with UDP auto-disable when ISPs block port 53</li>
<li>TTL-aware caching (sub-ms lookups)</li>
<li>Single binary, portable &mdash; macOS, Linux, and Windows</li>
</ul>
@@ -1261,7 +1313,7 @@ footer .closing {
</ul>
</div>
<div class="layer-card reveal reveal-delay-3">
<div class="layer-badge">Coming Next</div>
<div class="layer-badge">The Vision</div>
<h3>Self-Sovereign DNS</h3>
<ul>
<li>pkarr integration &mdash; DNS via Mainline DHT, no registrar needed</li>
@@ -1342,6 +1394,14 @@ footer .closing {
<td class="cross">No</td>
<td class="check">Root hints + full DNSSEC</td>
</tr>
<tr>
<td>DNSSEC validation</td>
<td class="muted">Passthrough</td>
<td class="muted">Cloud only</td>
<td class="muted">Cloud only</td>
<td class="muted">Passthrough</td>
<td class="check">Full chain-of-trust</td>
</tr>
<tr>
<td>Ad &amp; tracker blocking</td>
<td class="check">Yes</td>
@@ -1398,6 +1458,14 @@ footer .closing {
<td class="cross">No</td>
<td class="check">Built in (HTTP/2 + rustls)</td>
</tr>
<tr>
<td>DNS-over-TLS listener</td>
<td class="cross">No</td>
<td class="muted">Cloud only</td>
<td class="muted">Cloud only</td>
<td class="check">Yes (cert required)</td>
<td class="check">Self-signed or BYO</td>
</tr>
<tr>
<td>Conditional forwarding</td>
<td class="cross">No</td>
@@ -1567,11 +1635,14 @@ footer .closing {
<dt>Resolution Modes</dt>
<dd>Recursive (iterative from root hints, CNAME chasing, glue extraction) or Forward (DoH / plain UDP)</dd>
<dt>Listeners</dt>
<dd>UDP:53 + TCP:53 (plain DNS), DoT:853 (RFC 7858 + ALPN), HTTP proxy :80 / HTTPS proxy :443, dashboard :5380</dd>
<dt>DNSSEC</dt>
<dd>Chain-of-trust via ring &mdash; RSA/SHA-256, ECDSA P-256, Ed25519. NSEC/NSEC3 denial proofs. EDNS0 DO bit, 1232-byte payload (DNS Flag Day 2020).</dd>
<dt>Dependencies</dt>
<dd>19 runtime crates &mdash; tokio, axum, hyper, ring (DNSSEC), reqwest (DoH), rcgen + rustls (TLS), socket2 (multicast), serde, and more</dd>
<dd>A focused set &mdash; tokio, axum, hyper, ring (DNSSEC), reqwest (DoH), rcgen + rustls + tokio-rustls (TLS/DoT), socket2 (multicast), serde. No transitive DNS library.</dd>
<dt>Packet Format</dt>
<dd>RFC 1035 compliant. EDNS0 OPT pseudo-record. Parses A, AAAA, NS, CNAME, MX, SOA, SRV, HTTPS, DNSKEY, DS, RRSIG, NSEC, NSEC3.</dd>
@@ -1586,7 +1657,7 @@ footer .closing {
<span class="prompt">$</span> <span class="cmd">curl</span> <span class="flag">-fsSL</span> https://raw.githubusercontent.com/razvandimescu/numa/main/install.sh <span class="flag">|</span> <span class="cmd">sh</span>
<span class="comment"># Run</span>
<span class="prompt">$</span> <span class="cmd">sudo numa</span> <span class="comment"># bind to :53, :80, :5380</span>
<span class="prompt">$</span> <span class="cmd">sudo numa</span> <span class="comment"># bind :53, :80, :443, :853, :5380</span>
<span class="prompt">$</span> <span class="cmd">dig</span> <span class="flag">@127.0.0.1</span> google.com <span class="comment"># test resolution</span>
<span class="prompt">$</span> <span class="cmd">open</span> http://localhost:5380 <span class="comment"># dashboard</span>
<span class="prompt">$</span> <span class="cmd">curl</span> <span class="flag">-X POST</span> localhost:5380/services \
@@ -1639,16 +1710,28 @@ footer .closing {
<span class="phase">Phase 7</span>
<span class="phase-desc">DNSSEC validation &mdash; chain-of-trust, NSEC/NSEC3 denial proofs, RSA + ECDSA + Ed25519</span>
</div>
<div class="roadmap-item phase-teal">
<div class="roadmap-item done">
<span class="phase">Phase 8</span>
<span class="phase-desc">Hostile-network resilience &mdash; TCP fallback with UDP auto-disable when ISPs block :53, RFC 7816 query minimization</span>
</div>
<div class="roadmap-item done">
<span class="phase">Phase 9</span>
<span class="phase-desc">Windows support &mdash; cross-platform install/uninstall, <code>netsh</code> DNS config, service integration</span>
</div>
<div class="roadmap-item done">
<span class="phase">Phase 10</span>
<span class="phase-desc">DNS-over-TLS listener (RFC 7858) &mdash; ALPN enforcement, persistent connections, self-signed or BYO cert</span>
</div>
<div class="roadmap-item phase-teal">
<span class="phase">Phase 11</span>
<span class="phase-desc">pkarr integration &mdash; self-sovereign DNS via Mainline DHT, no registrar needed</span>
</div>
<div class="roadmap-item phase-teal">
<span class="phase">Phase 9</span>
<span class="phase">Phase 12</span>
<span class="phase-desc">Global .numa names &mdash; self-publish, DHT-backed, first-come-first-served</span>
</div>
<div class="roadmap-item phase-teal">
<span class="phase">Phase 10</span>
<span class="phase">Phase 13</span>
<span class="phase-desc">.onion bridge &mdash; human-readable Tor naming via Ed25519 same-key binding</span>
</div>
</div>

View File

@@ -57,6 +57,7 @@ pub fn router(ctx: Arc<ServerCtx>) -> Router {
.route("/services/{name}/routes", post(add_route))
.route("/services/{name}/routes", delete(remove_route))
.route("/ca.pem", get(serve_ca))
.route("/qr", get(serve_qr))
.route("/fonts/fonts.css", get(serve_fonts_css))
.route(
"/fonts/dm-sans-latin.woff2",
@@ -151,6 +152,7 @@ struct QueryLogResponse {
domain: String,
query_type: String,
path: String,
transport: String,
rescode: String,
latency_ms: f64,
dnssec: String,
@@ -158,6 +160,7 @@ struct QueryLogResponse {
#[derive(Serialize)]
struct StatsResponse {
version: &'static str,
uptime_secs: u64,
upstream: String,
mode: &'static str, // "recursive" or "forward" — never "auto" at runtime
@@ -166,13 +169,38 @@ struct StatsResponse {
dnssec: bool,
srtt: bool,
queries: QueriesStats,
transport: TransportStats,
upstream_transport: UpstreamTransportStats,
cache: CacheStats,
overrides: OverrideStats,
blocking: BlockingStatsResponse,
lan: LanStatsResponse,
mobile: MobileStatsResponse,
memory: MemoryStats,
}
#[derive(Serialize)]
struct TransportStats {
udp: u64,
tcp: u64,
dot: u64,
doh: u64,
}
#[derive(Serialize)]
struct UpstreamTransportStats {
udp: u64,
doh: u64,
dot: u64,
odoh: u64,
}
#[derive(Serialize)]
struct MobileStatsResponse {
enabled: bool,
port: u16,
}
#[derive(Serialize)]
struct LanStatsResponse {
enabled: bool,
@@ -183,6 +211,7 @@ struct LanStatsResponse {
struct QueriesStats {
total: u64,
forwarded: u64,
upstream: u64,
recursive: u64,
coalesced: u64,
cached: u64,
@@ -403,9 +432,12 @@ async fn diagnose(
}
// Check upstream (async, no locks held)
let upstream = ctx.upstream.lock().unwrap().clone();
let (upstream_matched, upstream_detail) =
forward_query_for_diagnose(&domain_lower, &upstream, ctx.timeout).await;
let upstream = ctx.upstream_pool.lock().unwrap().preferred().cloned();
let (upstream_matched, upstream_detail) = if let Some(ref u) = upstream {
forward_query_for_diagnose(&domain_lower, u, ctx.timeout).await
} else {
(false, "no upstream configured".to_string())
};
steps.push(DiagnoseStep {
source: "upstream".to_string(),
matched: upstream_matched,
@@ -472,6 +504,7 @@ async fn query_log(
domain: e.domain.clone(),
query_type: e.query_type.as_str().to_string(),
path: e.path.as_str().to_string(),
transport: e.transport.as_str().to_string(),
rescode: e.rescode.as_str().to_string(),
latency_ms: e.latency_us as f64 / 1000.0,
dnssec: e.dnssec.as_str().to_string(),
@@ -512,10 +545,11 @@ async fn stats(State(ctx): State<Arc<ServerCtx>>) -> Json<StatsResponse> {
let upstream = if ctx.upstream_mode == crate::config::UpstreamMode::Recursive {
"recursive (root hints)".to_string()
} else {
ctx.upstream.lock().unwrap().to_string()
ctx.upstream_pool.lock().unwrap().label()
};
Json(StatsResponse {
version: crate::version(),
uptime_secs: snap.uptime_secs,
upstream,
mode: ctx.upstream_mode.as_str(),
@@ -526,6 +560,7 @@ async fn stats(State(ctx): State<Arc<ServerCtx>>) -> Json<StatsResponse> {
queries: QueriesStats {
total: snap.total,
forwarded: snap.forwarded,
upstream: snap.upstream,
recursive: snap.recursive,
coalesced: snap.coalesced,
cached: snap.cached,
@@ -534,6 +569,18 @@ async fn stats(State(ctx): State<Arc<ServerCtx>>) -> Json<StatsResponse> {
blocked: snap.blocked,
errors: snap.errors,
},
transport: TransportStats {
udp: snap.transport_udp,
tcp: snap.transport_tcp,
dot: snap.transport_dot,
doh: snap.transport_doh,
},
upstream_transport: UpstreamTransportStats {
udp: snap.upstream_transport_udp,
doh: snap.upstream_transport_doh,
dot: snap.upstream_transport_dot,
odoh: snap.upstream_transport_odoh,
},
cache: CacheStats {
entries: cache_len,
max_entries: cache_max,
@@ -551,6 +598,10 @@ async fn stats(State(ctx): State<Arc<ServerCtx>>) -> Json<StatsResponse> {
enabled: ctx.lan_enabled,
peers: ctx.lan_peers.lock().unwrap().list().len(),
},
mobile: MobileStatsResponse {
enabled: ctx.mobile_enabled,
port: ctx.mobile_port,
},
memory: MemoryStats {
cache_bytes,
blocklist_bytes,
@@ -592,8 +643,19 @@ async fn flush_cache_domain(
StatusCode::NO_CONTENT
}
async fn health() -> Json<serde_json::Value> {
Json(serde_json::json!({ "status": "ok" }))
/// Enriched `/health` handler shared between the main API and the mobile API.
///
/// Returns the cached `HealthMeta` assembled with live fields (LAN IP,
/// uptime). Backward compatible with the previous minimal response in
/// that `status` is still the first field and `"ok"` is still the value.
/// The iOS companion app's `HealthInfo` Swift struct decodes the full
/// response; any HTTP client asserting only on `"status"` keeps working.
pub async fn health(State(ctx): State<Arc<ServerCtx>>) -> Json<crate::health::HealthResponse> {
let lan_ip = Some(*ctx.lan_ip.lock().unwrap());
Json(crate::health::HealthResponse::build(
&ctx.health_meta,
lan_ip,
))
}
// --- Blocking handlers ---
@@ -905,12 +967,8 @@ async fn remove_route(
}
}
async fn serve_ca(State(ctx): State<Arc<ServerCtx>>) -> Result<impl IntoResponse, StatusCode> {
let ca_path = ctx.data_dir.join(crate::tls::CA_FILE_NAME);
let bytes = tokio::task::spawn_blocking(move || std::fs::read(ca_path))
.await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
.map_err(|_| StatusCode::NOT_FOUND)?;
pub async fn serve_ca(State(ctx): State<Arc<ServerCtx>>) -> Result<impl IntoResponse, StatusCode> {
let pem = ctx.ca_pem.as_deref().ok_or(StatusCode::NOT_FOUND)?;
Ok((
[
(header::CONTENT_TYPE, "application/x-pem-file"),
@@ -920,7 +978,29 @@ async fn serve_ca(State(ctx): State<Arc<ServerCtx>>) -> Result<impl IntoResponse
),
(header::CACHE_CONTROL, "public, max-age=86400"),
],
bytes,
pem.to_string(),
))
}
async fn serve_qr(State(ctx): State<Arc<ServerCtx>>) -> Result<impl IntoResponse, StatusCode> {
if !ctx.mobile_enabled {
return Err(StatusCode::NOT_FOUND);
}
let lan_ip = *ctx.lan_ip.lock().unwrap();
let url = format!("http://{}:{}/mobileconfig", lan_ip, ctx.mobile_port);
let code = qrcode::QrCode::new(&url).map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
let svg = code
.render::<qrcode::render::svg::Color>()
.min_dimensions(180, 180)
.dark_color(qrcode::render::svg::Color("#2c2418"))
.light_color(qrcode::render::svg::Color("#faf7f2"))
.build();
Ok((
[
(header::CONTENT_TYPE, "image/svg+xml"),
(header::CACHE_CONTROL, "no-store"),
],
svg,
))
}
@@ -959,44 +1039,10 @@ mod tests {
use super::*;
use axum::body::Body;
use http::Request;
use std::sync::{Mutex, RwLock};
use tower::ServiceExt;
async fn test_ctx() -> Arc<ServerCtx> {
let socket = tokio::net::UdpSocket::bind("127.0.0.1:0").await.unwrap();
Arc::new(ServerCtx {
socket,
zone_map: std::collections::HashMap::new(),
cache: RwLock::new(crate::cache::DnsCache::new(100, 60, 86400)),
stats: Mutex::new(crate::stats::ServerStats::new()),
overrides: RwLock::new(crate::override_store::OverrideStore::new()),
blocklist: RwLock::new(crate::blocklist::BlocklistStore::new()),
query_log: Mutex::new(crate::query_log::QueryLog::new(100)),
services: Mutex::new(crate::service_store::ServiceStore::new()),
lan_peers: Mutex::new(crate::lan::PeerStore::new(90)),
forwarding_rules: Vec::new(),
upstream: Mutex::new(crate::forward::Upstream::Udp(
"127.0.0.1:53".parse().unwrap(),
)),
upstream_auto: false,
upstream_port: 53,
lan_ip: Mutex::new(std::net::Ipv4Addr::LOCALHOST),
timeout: std::time::Duration::from_secs(3),
proxy_tld: "numa".to_string(),
proxy_tld_suffix: ".numa".to_string(),
lan_enabled: false,
config_path: "/tmp/test-numa.toml".to_string(),
config_found: false,
config_dir: std::path::PathBuf::from("/tmp"),
data_dir: std::path::PathBuf::from("/tmp"),
tls_config: None,
upstream_mode: crate::config::UpstreamMode::Forward,
root_hints: Vec::new(),
srtt: RwLock::new(crate::srtt::SrttCache::new(true)),
inflight: Mutex::new(std::collections::HashMap::new()),
dnssec_enabled: false,
dnssec_strict: false,
})
Arc::new(crate::testutil::test_ctx().await)
}
#[tokio::test]

View File

@@ -1,5 +1,5 @@
use std::collections::HashSet;
use std::time::Instant;
use std::time::{Duration, Instant};
use log::{info, warn};
@@ -81,66 +81,70 @@ impl BlocklistStore {
if !self.enabled {
return false;
}
if let Some(until) = self.paused_until {
if Instant::now() < until {
return false;
}
}
if self.allowlist.contains(domain) {
let domain = Self::normalize(domain);
if Self::find_in_set(&domain, &self.allowlist).is_some() {
return false;
}
if self.domains.contains(domain) {
return true;
}
// Walk up: ads.tracker.example.com → tracker.example.com → example.com
let mut d = domain;
while let Some(dot) = d.find('.') {
d = &d[dot + 1..];
if self.allowlist.contains(d) {
return false;
}
if self.domains.contains(d) {
return true;
}
}
false
Self::find_in_set(&domain, &self.domains).is_some()
}
/// Check if a domain is blocked and return the reason.
pub fn check(&self, domain: &str) -> BlockCheckResult {
let domain = domain.to_lowercase();
if !self.enabled {
return BlockCheckResult::disabled();
}
if self.allowlist.contains(&domain) {
return BlockCheckResult::allowed(&domain, "exact match in allowlist");
if let Some(until) = self.paused_until {
if Instant::now() < until {
return BlockCheckResult::disabled();
}
}
if self.domains.contains(&domain) {
return BlockCheckResult::blocked(&domain, "exact match in blocklist");
let domain = Self::normalize(domain);
if let Some(matched) = Self::find_in_set(&domain, &self.allowlist) {
let reason = if matched == domain {
"exact match in allowlist"
} else {
"parent domain in allowlist"
};
return BlockCheckResult::allowed(matched, reason);
}
let mut d = domain.as_str();
while let Some(dot) = d.find('.') {
d = &d[dot + 1..];
if self.allowlist.contains(d) {
return BlockCheckResult::allowed(d, "parent domain in allowlist");
}
if self.domains.contains(d) {
return BlockCheckResult::blocked(d, "parent domain in blocklist");
}
if let Some(matched) = Self::find_in_set(&domain, &self.domains) {
let reason = if matched == domain {
"exact match in blocklist"
} else {
"parent domain in blocklist"
};
return BlockCheckResult::blocked(matched, reason);
}
BlockCheckResult::not_blocked()
}
fn normalize(domain: &str) -> String {
domain.to_lowercase().trim_end_matches('.').to_string()
}
fn find_in_set<'a>(domain: &'a str, set: &HashSet<String>) -> Option<&'a str> {
if set.contains(domain) {
return Some(domain);
}
let mut d = domain;
while let Some(dot) = d.find('.') {
d = &d[dot + 1..];
if set.contains(d) {
return Some(d);
}
}
None
}
/// Atomically swap in a new domain set. Build the set outside the lock,
/// then call this to swap — keeps lock hold time sub-microsecond.
pub fn swap_domains(&mut self, domains: HashSet<String>, sources: Vec<String>) {
@@ -172,11 +176,11 @@ impl BlocklistStore {
}
pub fn add_to_allowlist(&mut self, domain: &str) {
self.allowlist.insert(domain.to_lowercase());
self.allowlist.insert(Self::normalize(domain));
}
pub fn remove_from_allowlist(&mut self, domain: &str) -> bool {
self.allowlist.remove(&domain.to_lowercase())
self.allowlist.remove(&Self::normalize(domain))
}
pub fn allowlist(&self) -> Vec<String> {
@@ -247,6 +251,97 @@ pub fn parse_blocklist(text: &str) -> HashSet<String> {
mod tests {
use super::*;
fn store_with(domains: &[&str], allowlist: &[&str]) -> BlocklistStore {
let mut store = BlocklistStore::new();
store.swap_domains(domains.iter().map(|s| s.to_string()).collect(), vec![]);
for d in allowlist {
store.add_to_allowlist(d);
}
store
}
#[test]
fn exact_block() {
let store = store_with(&["ads.example.com"], &[]);
assert!(store.is_blocked("ads.example.com"));
assert!(!store.is_blocked("example.com"));
}
#[test]
fn parent_block_covers_subdomain() {
let store = store_with(&["tracker.com"], &[]);
assert!(store.is_blocked("tracker.com"));
assert!(store.is_blocked("www.tracker.com"));
assert!(store.is_blocked("deep.sub.tracker.com"));
}
#[test]
fn exact_allowlist_unblocks() {
let store = store_with(&["ads.example.com"], &["ads.example.com"]);
assert!(!store.is_blocked("ads.example.com"));
}
#[test]
fn parent_allowlist_unblocks_subdomain() {
let store = store_with(&["example.com", "www.example.com"], &["example.com"]);
assert!(!store.is_blocked("example.com"));
assert!(!store.is_blocked("www.example.com"));
assert!(!store.is_blocked("sub.deep.example.com"));
}
#[test]
fn allowlist_does_not_unblock_sibling() {
let store = store_with(
&["www.example.com", "ads.example.com"],
&["www.example.com"],
);
assert!(!store.is_blocked("www.example.com"));
assert!(store.is_blocked("ads.example.com"));
}
#[test]
fn check_reports_parent_allowlist() {
let store = store_with(
&["goatcounter.com", "www.goatcounter.com"],
&["goatcounter.com"],
);
let result = store.check("www.goatcounter.com");
assert!(!result.blocked);
assert_eq!(result.matched_rule.as_deref(), Some("goatcounter.com"));
}
#[test]
fn disabled_never_blocks() {
let mut store = store_with(&["ads.example.com"], &[]);
store.set_enabled(false);
assert!(!store.is_blocked("ads.example.com"));
}
#[test]
fn trailing_dot_normalized() {
let store = store_with(&["ads.example.com"], &["safe.example.com"]);
assert!(store.is_blocked("ads.example.com."));
assert!(!store.is_blocked("safe.example.com."));
let result = store.check("ads.example.com.");
assert!(result.blocked);
}
#[test]
fn case_insensitive() {
let store = store_with(&["ads.example.com"], &["safe.example.com"]);
assert!(store.is_blocked("ADS.Example.COM"));
assert!(!store.is_blocked("Safe.Example.COM"));
}
#[test]
fn domain_in_neither_list() {
let store = store_with(&["ads.example.com"], &[]);
let result = store.check("clean.example.org");
assert!(!result.blocked);
assert_eq!(result.reason, "not in blocklist");
assert!(result.matched_rule.is_none());
}
#[test]
fn heap_bytes_grows_with_domains() {
let mut store = BlocklistStore::new();
@@ -260,27 +355,139 @@ mod tests {
}
}
const RETRY_DELAYS_SECS: &[u64] = &[2, 10, 30];
pub async fn download_blocklists(lists: &[String]) -> Vec<(String, String)> {
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.timeout(Duration::from_secs(30))
.gzip(true)
.build()
.unwrap_or_default();
let mut results = Vec::new();
let fetches = lists.iter().map(|url| {
let client = &client;
async move {
let text = fetch_with_retry(client, url).await?;
info!("downloaded blocklist: {} ({} bytes)", url, text.len());
Some((url.clone(), text))
}
});
futures::future::join_all(fetches)
.await
.into_iter()
.flatten()
.collect()
}
for url in lists {
match client.get(url).send().await {
Ok(resp) => match resp.text().await {
Ok(text) => {
info!("downloaded blocklist: {} ({} bytes)", url, text.len());
results.push((url.clone(), text));
}
Err(e) => warn!("failed to read blocklist body {}: {}", url, e),
},
Err(e) => warn!("failed to download blocklist {}: {}", url, e),
async fn fetch_with_retry(client: &reqwest::Client, url: &str) -> Option<String> {
fetch_with_retry_delays(client, url, RETRY_DELAYS_SECS).await
}
async fn fetch_with_retry_delays(
client: &reqwest::Client,
url: &str,
delays: &[u64],
) -> Option<String> {
let total = delays.len() + 1;
for attempt in 1..=total {
match fetch_once(client, url).await {
Ok(text) => return Some(text),
Err(msg) if attempt < total => {
let delay = delays[attempt - 1];
warn!(
"blocklist {} attempt {}/{} failed: {} — retrying in {}s",
url, attempt, total, msg, delay
);
tokio::time::sleep(Duration::from_secs(delay)).await;
}
Err(msg) => {
warn!(
"blocklist {} attempt {}/{} failed: {} — giving up",
url, attempt, total, msg
);
}
}
}
results
None
}
async fn fetch_once(client: &reqwest::Client, url: &str) -> Result<String, String> {
let resp = client
.get(url)
.send()
.await
.map_err(|e| format_error_chain(&e))?;
resp.text().await.map_err(|e| format_error_chain(&e))
}
fn format_error_chain(e: &(dyn std::error::Error + 'static)) -> String {
let mut parts = vec![e.to_string()];
let mut src = e.source();
while let Some(s) = src {
parts.push(s.to_string());
src = s.source();
}
parts.join(": ")
}
#[cfg(test)]
mod retry_tests {
use super::*;
use std::net::SocketAddr;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpListener;
async fn flaky_http_server(drop_first_n: usize, body: &'static str) -> SocketAddr {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
tokio::spawn(async move {
for _ in 0..drop_first_n {
if let Ok((sock, _)) = listener.accept().await {
drop(sock);
}
}
loop {
let Ok((mut sock, _)) = listener.accept().await else {
return;
};
tokio::spawn(async move {
let mut buf = [0u8; 2048];
let _ = sock.read(&mut buf).await;
let response = format!(
"HTTP/1.1 200 OK\r\nContent-Length: {}\r\nContent-Type: text/plain\r\nConnection: close\r\n\r\n{}",
body.len(),
body,
);
let _ = sock.write_all(response.as_bytes()).await;
let _ = sock.shutdown().await;
});
}
});
addr
}
fn zero_delays() -> Vec<u64> {
vec![0; RETRY_DELAYS_SECS.len()]
}
#[tokio::test]
async fn retry_succeeds_on_final_attempt() {
let body = "ads.example.com\ntracker.example.net\n";
let delays = zero_delays();
let addr = flaky_http_server(delays.len(), body).await;
let client = reqwest::Client::new();
let url = format!("http://{addr}/");
let result = fetch_with_retry_delays(&client, &url, &delays).await;
assert_eq!(result.as_deref(), Some(body));
}
#[tokio::test]
async fn retry_gives_up_when_all_attempts_fail() {
let delays = zero_delays();
let addr = flaky_http_server(delays.len() + 2, "unreachable").await;
let client = reqwest::Client::new();
let url = format!("http://{addr}/");
let result = fetch_with_retry_delays(&client, &url, &delays).await;
assert_eq!(result, None);
}
}

View File

@@ -1,9 +1,26 @@
use std::collections::HashMap;
use std::time::{Duration, Instant};
use crate::buffer::BytePacketBuffer;
use crate::packet::DnsPacket;
use crate::question::QueryType;
use crate::record::DnsRecord;
use crate::wire::WireMeta;
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum Freshness {
/// Within TTL, no action needed.
Fresh,
/// Within TTL but <10% remaining — trigger background prefetch.
NearExpiry,
/// Past TTL but within stale window — serve with TTL=1, trigger background refresh.
Stale,
}
impl Freshness {
pub fn needs_refresh(self) -> bool {
matches!(self, Freshness::NearExpiry | Freshness::Stale)
}
}
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)]
pub enum DnssecStatus {
@@ -26,14 +43,16 @@ impl DnssecStatus {
}
struct CacheEntry {
packet: DnsPacket,
wire: Vec<u8>,
meta: WireMeta,
inserted_at: Instant,
ttl: Duration,
dnssec_status: DnssecStatus,
}
/// DNS cache using a two-level map (domain -> query_type -> entry) so that
/// lookups can borrow `&str` instead of allocating a `String` key.
const STALE_WINDOW: Duration = Duration::from_secs(3600);
/// DNS cache with serve-stale (RFC 8767). Stores raw wire bytes.
pub struct DnsCache {
entries: HashMap<String, HashMap<QueryType, CacheEntry>>,
entry_count: usize,
@@ -53,54 +72,60 @@ impl DnsCache {
}
}
/// Read-only lookup — expired entries are left in place (cleaned up on insert).
pub fn lookup(&self, domain: &str, qtype: QueryType) -> Option<DnsPacket> {
self.lookup_with_status(domain, qtype).map(|(pkt, _)| pkt)
}
pub fn lookup_with_status(
/// Look up cached wire bytes, patching ID and TTLs in the returned copy.
/// Implements serve-stale (RFC 8767): expired entries within STALE_WINDOW
/// are returned with TTL=1 and `stale=true` so callers can revalidate.
pub fn lookup_wire(
&self,
domain: &str,
qtype: QueryType,
) -> Option<(DnsPacket, DnssecStatus)> {
new_id: u16,
) -> Option<(Vec<u8>, DnssecStatus, Freshness)> {
let type_map = self.entries.get(domain)?;
let entry = type_map.get(&qtype)?;
let elapsed = entry.inserted_at.elapsed();
if elapsed >= entry.ttl {
let (remaining, freshness) = if elapsed < entry.ttl {
let secs = (entry.ttl - elapsed).as_secs() as u32;
let f = if elapsed * 10 >= entry.ttl * 9 {
Freshness::NearExpiry
} else {
Freshness::Fresh
};
(secs.max(1), f)
} else if elapsed < entry.ttl + STALE_WINDOW {
(1, Freshness::Stale)
} else {
return None;
}
};
let remaining_secs = (entry.ttl - elapsed).as_secs() as u32;
let remaining = remaining_secs.max(1);
let mut wire = entry.wire.clone();
crate::wire::patch_id(&mut wire, new_id);
crate::wire::patch_ttls(&mut wire, &entry.meta.ttl_offsets, remaining);
let mut packet = entry.packet.clone();
adjust_ttls(&mut packet.answers, remaining);
adjust_ttls(&mut packet.authorities, remaining);
adjust_ttls(&mut packet.resources, remaining);
Some((packet, entry.dnssec_status))
Some((wire, entry.dnssec_status, freshness))
}
pub fn insert(&mut self, domain: &str, qtype: QueryType, packet: &DnsPacket) {
self.insert_with_status(domain, qtype, packet, DnssecStatus::Indeterminate);
}
pub fn insert_with_status(
pub fn insert_wire(
&mut self,
domain: &str,
qtype: QueryType,
packet: &DnsPacket,
wire: &[u8],
dnssec_status: DnssecStatus,
) {
let meta = match crate::wire::scan_ttl_offsets(wire) {
Ok(m) => m,
Err(_) => return, // malformed wire, skip
};
if self.entry_count >= self.max_entries {
self.evict_expired();
if self.entry_count >= self.max_entries {
return;
self.evict_stalest();
}
}
let min_ttl = extract_min_ttl(&packet.answers)
let min_ttl = crate::wire::min_ttl_from_wire(wire, &meta)
.unwrap_or(self.min_ttl)
.clamp(self.min_ttl, self.max_ttl);
@@ -117,7 +142,8 @@ impl DnsCache {
type_map.insert(
qtype,
CacheEntry {
packet: packet.clone(),
wire: wire.to_vec(),
meta,
inserted_at: Instant::now(),
ttl: Duration::from_secs(min_ttl as u64),
dnssec_status,
@@ -125,6 +151,64 @@ impl DnsCache {
);
}
/// Read-only lookup — expired entries are left in place (cleaned up on insert).
pub fn lookup(&self, domain: &str, qtype: QueryType) -> Option<DnsPacket> {
self.lookup_with_status(domain, qtype)
.map(|(pkt, _, _)| pkt)
}
pub fn lookup_with_status(
&self,
domain: &str,
qtype: QueryType,
) -> Option<(DnsPacket, DnssecStatus, Freshness)> {
let (wire, status, freshness) = self.lookup_wire(domain, qtype, 0)?;
let mut buf = BytePacketBuffer::from_bytes(&wire);
let pkt = DnsPacket::from_buffer(&mut buf).ok()?;
Some((pkt, status, freshness))
}
pub fn insert(&mut self, domain: &str, qtype: QueryType, packet: &DnsPacket) {
self.insert_with_status(domain, qtype, packet, DnssecStatus::Indeterminate);
}
pub fn insert_with_status(
&mut self,
domain: &str,
qtype: QueryType,
packet: &DnsPacket,
dnssec_status: DnssecStatus,
) {
let mut buf = BytePacketBuffer::new();
if packet.write(&mut buf).is_err() {
return;
}
self.insert_wire(domain, qtype, buf.filled(), dnssec_status);
}
pub fn ttl_remaining(&self, domain: &str, qtype: QueryType) -> Option<(u32, u32)> {
let type_map = self.entries.get(domain)?;
let entry = type_map.get(&qtype)?;
let elapsed = entry.inserted_at.elapsed();
if elapsed >= entry.ttl {
return None;
}
let total = entry.ttl.as_secs() as u32;
let remaining = (entry.ttl - elapsed).as_secs() as u32;
Some((remaining, total))
}
pub fn needs_warm(&self, domain: &str) -> bool {
for qtype in [QueryType::A, QueryType::AAAA] {
match self.ttl_remaining(domain, qtype) {
None => return true,
Some((remaining, total)) if remaining < total / 4 => return true,
_ => {}
}
}
false
}
pub fn len(&self) -> usize {
self.entry_count
}
@@ -156,7 +240,8 @@ impl DnsCache {
+ 1;
total += type_map.capacity() * inner_slot;
for entry in type_map.values() {
total += entry.packet.heap_bytes();
total += entry.wire.capacity()
+ entry.meta.ttl_offsets.capacity() * std::mem::size_of::<usize>();
}
}
total
@@ -197,6 +282,34 @@ impl DnsCache {
});
self.entry_count -= count;
}
/// Evict the single entry closest to (or furthest past) expiry.
fn evict_stalest(&mut self) {
let mut worst: Option<(String, QueryType, Duration)> = None;
for (domain, type_map) in &self.entries {
for (qtype, entry) in type_map {
let age = entry.inserted_at.elapsed();
let remaining = entry.ttl.saturating_sub(age);
match &worst {
None => worst = Some((domain.clone(), *qtype, remaining)),
Some((_, _, w)) if remaining < *w => {
worst = Some((domain.clone(), *qtype, remaining));
}
_ => {}
}
}
}
if let Some((domain, qtype, _)) = worst {
if let Some(type_map) = self.entries.get_mut(&domain) {
if type_map.remove(&qtype).is_some() {
self.entry_count -= 1;
}
if type_map.is_empty() {
self.entries.remove(&domain);
}
}
}
}
}
pub struct CacheInfo {
@@ -205,20 +318,11 @@ pub struct CacheInfo {
pub ttl_remaining: u32,
}
fn extract_min_ttl(records: &[DnsRecord]) -> Option<u32> {
records.iter().map(|r| r.ttl()).min()
}
fn adjust_ttls(records: &mut [DnsRecord], new_ttl: u32) {
for record in records.iter_mut() {
record.set_ttl(new_ttl);
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::packet::DnsPacket;
use crate::record::DnsRecord;
#[test]
fn heap_bytes_grows_with_entries() {
@@ -233,4 +337,66 @@ mod tests {
cache.insert("example.com", QueryType::A, &pkt);
assert!(cache.heap_bytes() > empty);
}
#[test]
fn ttl_remaining_returns_values_for_fresh_entry() {
let mut cache = DnsCache::new(100, 60, 3600);
let mut pkt = DnsPacket::new();
pkt.answers.push(DnsRecord::A {
domain: "example.com".into(),
addr: "1.2.3.4".parse().unwrap(),
ttl: 300,
});
cache.insert("example.com", QueryType::A, &pkt);
let (remaining, total) = cache.ttl_remaining("example.com", QueryType::A).unwrap();
assert_eq!(total, 300);
assert!(remaining <= 300);
assert!(remaining > 0);
}
#[test]
fn ttl_remaining_none_for_missing() {
let cache = DnsCache::new(100, 1, 3600);
assert!(cache.ttl_remaining("missing.com", QueryType::A).is_none());
}
#[test]
fn needs_warm_true_when_missing() {
let cache = DnsCache::new(100, 1, 3600);
assert!(cache.needs_warm("missing.com"));
}
#[test]
fn needs_warm_false_when_fresh() {
let mut cache = DnsCache::new(100, 1, 3600);
let mut pkt_a = DnsPacket::new();
pkt_a.answers.push(DnsRecord::A {
domain: "example.com".into(),
addr: "1.2.3.4".parse().unwrap(),
ttl: 300,
});
let mut pkt_aaaa = DnsPacket::new();
pkt_aaaa.answers.push(DnsRecord::AAAA {
domain: "example.com".into(),
addr: "::1".parse().unwrap(),
ttl: 300,
});
cache.insert("example.com", QueryType::A, &pkt_a);
cache.insert("example.com", QueryType::AAAA, &pkt_aaaa);
assert!(!cache.needs_warm("example.com"));
}
#[test]
fn needs_warm_true_when_only_a_cached() {
let mut cache = DnsCache::new(100, 1, 3600);
let mut pkt = DnsPacket::new();
pkt.answers.push(DnsRecord::A {
domain: "example.com".into(),
addr: "1.2.3.4".parse().unwrap(),
ttl: 300,
});
cache.insert("example.com", QueryType::A, &pkt);
// AAAA missing → needs warm
assert!(cache.needs_warm("example.com"));
}
}

View File

@@ -1,7 +1,7 @@
use std::collections::HashMap;
use std::net::Ipv4Addr;
use std::net::Ipv6Addr;
use std::net::{IpAddr, Ipv4Addr, Ipv6Addr, SocketAddr};
use std::path::{Path, PathBuf};
use std::time::Duration;
use serde::Deserialize;
@@ -31,6 +31,54 @@ pub struct Config {
pub dnssec: DnssecConfig,
#[serde(default)]
pub dot: DotConfig,
#[serde(default)]
pub mobile: MobileConfig,
#[serde(default)]
pub forwarding: Vec<ForwardingRuleConfig>,
}
#[derive(Deserialize, Clone, Debug)]
pub struct ForwardingRuleConfig {
#[serde(deserialize_with = "string_or_vec")]
pub suffix: Vec<String>,
#[serde(deserialize_with = "string_or_vec")]
pub upstream: Vec<String>,
}
impl ForwardingRuleConfig {
fn to_runtime_rules(&self) -> Result<Vec<crate::system_dns::ForwardingRule>> {
if self.upstream.is_empty() {
return Err(format!(
"forwarding rule for suffix {:?}: upstream must not be empty",
self.suffix
)
.into());
}
let mut primary = Vec::with_capacity(self.upstream.len());
for s in &self.upstream {
let u = crate::forward::parse_upstream(s, 53)
.map_err(|e| format!("forwarding rule for upstream '{}': {}", s, e))?;
primary.push(u);
}
let pool = crate::forward::UpstreamPool::new(primary, vec![]);
Ok(self
.suffix
.iter()
.map(|s| crate::system_dns::ForwardingRule::new(s.clone(), pool.clone()))
.collect())
}
}
pub fn merge_forwarding_rules(
config_rules: &[ForwardingRuleConfig],
discovered: Vec<crate::system_dns::ForwardingRule>,
) -> Result<Vec<crate::system_dns::ForwardingRule>> {
let mut merged: Vec<crate::system_dns::ForwardingRule> = Vec::new();
for rule in config_rules {
merged.extend(rule.to_runtime_rules()?);
}
merged.extend(discovered);
Ok(merged)
}
#[derive(Deserialize)]
@@ -45,6 +93,12 @@ pub struct ServerConfig {
/// Defaults to `crate::data_dir()` (platform-specific system path) if unset.
#[serde(default)]
pub data_dir: Option<PathBuf>,
/// Synthesize NODATA (NOERROR + empty answer) for AAAA queries, and
/// strip `ipv6hint` from HTTPS/SVCB responses (RFC 9460). For IPv4-only
/// networks where Happy Eyeballs fallback adds latency. Local zones,
/// overrides, and the service proxy are not affected. Default false.
#[serde(default)]
pub filter_aaaa: bool,
}
impl Default for ServerConfig {
@@ -54,6 +108,7 @@ impl Default for ServerConfig {
api_port: default_api_port(),
api_bind_addr: default_api_bind_addr(),
data_dir: None,
filter_aaaa: false,
}
}
}
@@ -79,6 +134,7 @@ pub enum UpstreamMode {
#[default]
Forward,
Recursive,
Odoh,
}
impl UpstreamMode {
@@ -87,6 +143,20 @@ impl UpstreamMode {
UpstreamMode::Auto => "auto",
UpstreamMode::Forward => "forward",
UpstreamMode::Recursive => "recursive",
UpstreamMode::Odoh => "odoh",
}
}
/// Hedging duplicates the in-flight query against the same upstream to
/// rescue tail latency. Beneficial for UDP/DoH/DoT (cheap retransmit /
/// h2 stream multiplexing). For ODoH it doubles the relay's HPKE
/// seal/unseal load and the sealed-byte footprint a passive observer
/// can correlate, with no latency win — the relay hop dominates either
/// way. Force-zero in oblivious mode regardless of `hedge_ms`.
pub fn hedge_delay(self, hedge_ms: u64) -> Duration {
match self {
UpstreamMode::Odoh => Duration::ZERO,
_ => Duration::from_millis(hedge_ms),
}
}
}
@@ -95,34 +165,189 @@ impl UpstreamMode {
pub struct UpstreamConfig {
#[serde(default)]
pub mode: UpstreamMode,
#[serde(default = "default_upstream_addr")]
pub address: String,
#[serde(default, deserialize_with = "string_or_vec")]
pub address: Vec<String>,
#[serde(default = "default_upstream_port")]
pub port: u16,
#[serde(default, deserialize_with = "string_or_vec")]
pub fallback: Vec<String>,
#[serde(default = "default_timeout_ms")]
pub timeout_ms: u64,
#[serde(default = "default_hedge_ms")]
pub hedge_ms: u64,
#[serde(default = "default_root_hints")]
pub root_hints: Vec<String>,
#[serde(default = "default_prime_tlds")]
pub prime_tlds: Vec<String>,
#[serde(default = "default_srtt")]
pub srtt: bool,
/// Only used when `mode = "odoh"`. Full https:// URL of the relay
/// endpoint (including path, e.g. `https://odoh-relay.numa.rs/relay`).
#[serde(default)]
pub relay: Option<String>,
/// Only used when `mode = "odoh"`. Full https:// URL of the target
/// resolver (`https://odoh.cloudflare-dns.com/dns-query`).
#[serde(default)]
pub target: Option<String>,
/// Only used when `mode = "odoh"`. When true (the default), relay failure
/// returns SERVFAIL instead of downgrading to the `fallback` upstream —
/// a user who configured ODoH rarely wants a silent non-oblivious path.
#[serde(default)]
pub strict: Option<bool>,
/// Bootstrap IP for the relay host, used when numa is its own system
/// resolver (otherwise the ODoH HTTPS client loops resolving through
/// itself). TLS still validates the cert against `relay`'s hostname.
#[serde(default)]
pub relay_ip: Option<IpAddr>,
/// Same as `relay_ip` but for the target host.
#[serde(default)]
pub target_ip: Option<IpAddr>,
}
impl Default for UpstreamConfig {
fn default() -> Self {
UpstreamConfig {
mode: UpstreamMode::default(),
address: default_upstream_addr(),
address: Vec::new(),
port: default_upstream_port(),
fallback: Vec::new(),
timeout_ms: default_timeout_ms(),
hedge_ms: default_hedge_ms(),
root_hints: default_root_hints(),
prime_tlds: default_prime_tlds(),
srtt: default_srtt(),
relay: None,
target: None,
strict: None,
relay_ip: None,
target_ip: None,
}
}
}
/// Parsed ODoH config fields. `mode = "odoh"` requires both URLs to be
/// present, to parse as `https://`, and to resolve to distinct hosts.
#[derive(Debug)]
pub struct OdohUpstream {
pub relay_url: String,
pub relay_host: String,
pub target_host: String,
pub target_path: String,
pub strict: bool,
pub relay_bootstrap: Option<SocketAddr>,
pub target_bootstrap: Option<SocketAddr>,
}
impl UpstreamConfig {
/// Validate and extract ODoH-specific fields. Called during `load_config`
/// so misconfigured ODoH fails fast at startup, the same care we take
/// with the DNSSEC strict boot check.
pub fn odoh_upstream(&self) -> Result<OdohUpstream> {
let relay = self
.relay
.as_deref()
.ok_or("mode = \"odoh\" requires upstream.relay")?;
let target = self
.target
.as_deref()
.ok_or("mode = \"odoh\" requires upstream.target")?;
let relay_url = reqwest::Url::parse(relay)
.map_err(|e| format!("upstream.relay invalid URL '{}': {}", relay, e))?;
let target_url = reqwest::Url::parse(target)
.map_err(|e| format!("upstream.target invalid URL '{}': {}", target, e))?;
if relay_url.scheme() != "https" || target_url.scheme() != "https" {
return Err("upstream.relay and upstream.target must both use https://".into());
}
let relay_host = relay_url
.host_str()
.ok_or("upstream.relay must include a host")?
.to_string();
let target_host = target_url
.host_str()
.ok_or("upstream.target must include a host")?
.to_string();
if relay_host == target_host {
return Err(format!(
"upstream.relay and upstream.target resolve to the same host ({}); the privacy property requires distinct operators",
relay_host
)
.into());
}
if let Some(shared) = shared_registrable_domain(&relay_host, &target_host) {
return Err(format!(
"upstream.relay ({}) and upstream.target ({}) share the registrable domain ({}); the privacy property requires distinct operators",
relay_host, target_host, shared
)
.into());
}
let target_path = if target_url.path().is_empty() {
"/".to_string()
} else {
target_url.path().to_string()
};
let relay_port = relay_url.port_or_known_default().unwrap_or(443);
let target_port = target_url.port_or_known_default().unwrap_or(443);
Ok(OdohUpstream {
relay_url: relay.to_string(),
relay_host,
target_host,
target_path,
strict: self.strict.unwrap_or(true),
relay_bootstrap: self.relay_ip.map(|ip| SocketAddr::new(ip, relay_port)),
target_bootstrap: self.target_ip.map(|ip| SocketAddr::new(ip, target_port)),
})
}
}
/// Returns the registrable domain (eTLD+1) shared by both hosts, if any.
/// Fails open on hosts the PSL can't parse (IP literals, bare TLDs).
fn shared_registrable_domain(relay_host: &str, target_host: &str) -> Option<String> {
let relay = psl::domain(relay_host.as_bytes())?;
let target = psl::domain(target_host.as_bytes())?;
if relay.as_bytes() == target.as_bytes() {
std::str::from_utf8(relay.as_bytes())
.ok()
.map(str::to_owned)
} else {
None
}
}
fn string_or_vec<'de, D>(deserializer: D) -> std::result::Result<Vec<String>, D::Error>
where
D: serde::Deserializer<'de>,
{
struct Visitor;
impl<'de> serde::de::Visitor<'de> for Visitor {
type Value = Vec<String>;
fn expecting(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
f.write_str("string or array of strings")
}
fn visit_str<E: serde::de::Error>(self, v: &str) -> std::result::Result<Self::Value, E> {
Ok(vec![v.to_string()])
}
fn visit_seq<A: serde::de::SeqAccess<'de>>(
self,
mut seq: A,
) -> std::result::Result<Self::Value, A::Error> {
let mut v = Vec::new();
while let Some(s) = seq.next_element::<String>()? {
v.push(s);
}
Ok(v)
}
}
deserializer.deserialize_any(Visitor)
}
fn default_true() -> bool {
true
}
@@ -200,15 +425,15 @@ fn default_root_hints() -> Vec<String> {
]
}
fn default_upstream_addr() -> String {
String::new() // empty = auto-detect from system resolver
}
fn default_upstream_port() -> u16 {
53
}
fn default_timeout_ms() -> u64 {
5000
}
fn default_hedge_ms() -> u64 {
10
}
#[derive(Deserialize)]
pub struct CacheConfig {
@@ -218,6 +443,8 @@ pub struct CacheConfig {
pub min_ttl: u32,
#[serde(default = "default_max_ttl")]
pub max_ttl: u32,
#[serde(default)]
pub warm: Vec<String>,
}
impl Default for CacheConfig {
@@ -226,12 +453,13 @@ impl Default for CacheConfig {
max_entries: default_max_entries(),
min_ttl: default_min_ttl(),
max_ttl: default_max_ttl(),
warm: Vec::new(),
}
}
}
fn default_max_entries() -> usize {
10000
100_000
}
fn default_min_ttl() -> u32 {
60
@@ -379,7 +607,7 @@ pub struct DnssecConfig {
#[derive(Deserialize, Clone)]
pub struct DotConfig {
#[serde(default)]
#[serde(default = "default_dot_enabled")]
pub enabled: bool,
#[serde(default = "default_dot_port")]
pub port: u16,
@@ -396,7 +624,7 @@ pub struct DotConfig {
impl Default for DotConfig {
fn default() -> Self {
DotConfig {
enabled: false,
enabled: default_dot_enabled(),
port: default_dot_port(),
bind_addr: default_dot_bind_addr(),
cert_path: None,
@@ -405,6 +633,9 @@ impl Default for DotConfig {
}
}
fn default_dot_enabled() -> bool {
true
}
fn default_dot_port() -> u16 {
853
}
@@ -412,6 +643,53 @@ fn default_dot_bind_addr() -> String {
"0.0.0.0".to_string()
}
/// Configuration for the mobile API — a persistent HTTP listener that
/// serves a read-only subset of routes (`/health`, `/ca.pem`,
/// `/mobileconfig`, `/ca.mobileconfig`) on a LAN-reachable port, for
/// consumption by the iOS/Android companion apps.
///
/// Unlike the main API (port 5380, localhost-only by default, supports
/// state-mutating routes), the mobile API is safe to expose on the LAN
/// because every route is idempotent and read-only.
#[derive(Deserialize, Clone)]
pub struct MobileConfig {
/// If true, spawn the mobile API listener at startup. **Default false.**
/// Opt-in because the listener binds to the LAN by default and exposes
/// a few read-only endpoints to any device on the same network (`/health`,
/// `/ca.pem`, `/mobileconfig`, `/ca.mobileconfig`). None of those are
/// cryptographically sensitive (the CA private key is never served),
/// but users should enable this explicitly rather than have a new
/// LAN-reachable port appear after an upgrade.
#[serde(default)]
pub enabled: bool,
/// Port for the mobile API. Default 8765.
#[serde(default = "default_mobile_port")]
pub port: u16,
/// Bind address for the mobile API. Default "0.0.0.0" (all interfaces)
/// so phones on the LAN can reach it. Set to "127.0.0.1" to restrict
/// to localhost — useful if you're running behind another front-end.
#[serde(default = "default_mobile_bind_addr")]
pub bind_addr: String,
}
impl Default for MobileConfig {
fn default() -> Self {
MobileConfig {
enabled: false,
port: default_mobile_port(),
bind_addr: default_mobile_bind_addr(),
}
}
}
fn default_mobile_port() -> u16 {
8765
}
fn default_mobile_bind_addr() -> String {
"0.0.0.0".to_string()
}
#[cfg(test)]
mod tests {
use super::*;
@@ -446,6 +724,17 @@ mod tests {
assert!(config.lan.enabled);
}
#[test]
fn filter_aaaa_defaults_false() {
assert!(!ServerConfig::default().filter_aaaa);
}
#[test]
fn filter_aaaa_parses_from_server_section() {
let config: Config = toml::from_str("[server]\nfilter_aaaa = true").unwrap();
assert!(config.server.filter_aaaa);
}
#[test]
fn custom_bind_addrs_parse() {
let toml = r#"
@@ -476,6 +765,496 @@ mod tests {
assert!(config.services[0].routes[0].strip);
assert!(!config.services[0].routes[1].strip); // default false
}
#[test]
fn address_string_parses_to_vec() {
let config: Config = toml::from_str("[upstream]\naddress = \"1.2.3.4\"").unwrap();
assert_eq!(config.upstream.address, vec!["1.2.3.4"]);
}
#[test]
fn address_array_parses() {
let config: Config =
toml::from_str("[upstream]\naddress = [\"1.2.3.4\", \"5.6.7.8:5353\"]").unwrap();
assert_eq!(config.upstream.address, vec!["1.2.3.4", "5.6.7.8:5353"]);
}
#[test]
fn fallback_array_parses() {
let config: Config =
toml::from_str("[upstream]\nfallback = [\"8.8.8.8\", \"1.1.1.1\"]").unwrap();
assert_eq!(config.upstream.fallback, vec!["8.8.8.8", "1.1.1.1"]);
}
#[test]
fn fallback_string_parses_as_singleton_vec() {
let config: Config =
toml::from_str("[upstream]\nfallback = \"tls://1.1.1.1#cloudflare-dns.com\"").unwrap();
assert_eq!(
config.upstream.fallback,
vec!["tls://1.1.1.1#cloudflare-dns.com"]
);
}
#[test]
fn empty_address_gives_empty_vec() {
let config: Config = toml::from_str("").unwrap();
assert!(config.upstream.address.is_empty());
assert!(config.upstream.fallback.is_empty());
}
// ── [upstream] mode = "odoh" ────────────────────────────────────────
#[test]
fn odoh_config_parses_and_validates() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert!(matches!(config.upstream.mode, UpstreamMode::Odoh));
let odoh = config.upstream.odoh_upstream().unwrap();
assert_eq!(odoh.relay_url, "https://odoh-relay.numa.rs/relay");
assert_eq!(odoh.target_host, "odoh.cloudflare-dns.com");
assert_eq!(odoh.target_path, "/dns-query");
assert!(odoh.strict, "strict defaults to true under mode=odoh");
}
#[test]
fn odoh_strict_false_is_honoured() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
strict = false
"#;
let config: Config = toml::from_str(toml).unwrap();
assert!(!config.upstream.odoh_upstream().unwrap().strict);
}
#[test]
fn odoh_rejects_same_host_relay_and_target() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh.example.com/relay"
target = "https://odoh.example.com/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("same host"), "got: {err}");
}
#[test]
fn odoh_rejects_shared_registrable_domain() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://r.cloudflare.com/relay"
target = "https://odoh.cloudflare.com/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("registrable domain"), "got: {err}");
assert!(err.contains("cloudflare.com"), "got: {err}");
}
#[test]
fn odoh_rejects_shared_registrable_under_multi_label_suffix() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://a.foo.co.uk/relay"
target = "https://b.foo.co.uk/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("foo.co.uk"), "got: {err}");
}
#[test]
fn odoh_accepts_distinct_registrable_under_multi_label_suffix() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://relay.foo.co.uk/relay"
target = "https://target.bar.co.uk/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert!(config.upstream.odoh_upstream().is_ok());
}
#[test]
fn odoh_accepts_distinct_private_psl_suffix_subdomains() {
// *.github.io is a public suffix, so foo.github.io and bar.github.io
// are independent registrable domains — accept.
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://foo.github.io/relay"
target = "https://bar.github.io/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert!(config.upstream.odoh_upstream().is_ok());
}
#[test]
fn odoh_rejects_non_https() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "http://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("https"), "got: {err}");
}
#[test]
fn odoh_missing_relay_rejected() {
let toml = r#"
[upstream]
mode = "odoh"
target = "https://odoh.cloudflare-dns.com/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("upstream.relay"), "got: {err}");
}
#[test]
fn odoh_bootstrap_ips_parse_into_socket_addrs() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
relay_ip = "178.104.229.30"
target_ip = "104.16.249.249"
"#;
let config: Config = toml::from_str(toml).unwrap();
let odoh = config.upstream.odoh_upstream().unwrap();
assert_eq!(odoh.relay_host, "odoh-relay.numa.rs");
assert_eq!(
odoh.relay_bootstrap.unwrap().to_string(),
"178.104.229.30:443"
);
assert_eq!(
odoh.target_bootstrap.unwrap().to_string(),
"104.16.249.249:443"
);
}
#[test]
fn odoh_bootstrap_ips_optional() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let odoh = config.upstream.odoh_upstream().unwrap();
assert!(odoh.relay_bootstrap.is_none());
assert!(odoh.target_bootstrap.is_none());
}
#[test]
fn odoh_bootstrap_ip_rejects_garbage() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
relay_ip = "not-an-ip"
"#;
let err = toml::from_str::<Config>(toml).err().unwrap().to_string();
assert!(err.contains("relay_ip"), "got: {err}");
}
#[test]
fn odoh_bootstrap_uses_url_port_when_non_default() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs:8443/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
relay_ip = "178.104.229.30"
"#;
let config: Config = toml::from_str(toml).unwrap();
let odoh = config.upstream.odoh_upstream().unwrap();
assert_eq!(
odoh.relay_bootstrap.unwrap().to_string(),
"178.104.229.30:8443"
);
}
#[test]
fn hedge_delay_zeroed_for_odoh_mode() {
assert_eq!(
UpstreamMode::Odoh.hedge_delay(50),
Duration::ZERO,
"ODoH mode must zero hedge regardless of configured hedge_ms"
);
assert_eq!(
UpstreamMode::Forward.hedge_delay(50),
Duration::from_millis(50),
"non-ODoH modes honour configured hedge_ms"
);
}
#[test]
fn odoh_missing_target_rejected() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs/relay"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("upstream.target"), "got: {err}");
}
// ── issue #82: [[forwarding]] config section ────────────────────────
#[test]
fn forwarding_empty_by_default() {
let config: Config = toml::from_str("").unwrap();
assert!(config.forwarding.is_empty());
}
#[test]
fn forwarding_parses_single_rule() {
let toml = r#"
[[forwarding]]
suffix = "home.local"
upstream = "100.90.1.63:5361"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert_eq!(config.forwarding.len(), 1);
assert_eq!(config.forwarding[0].suffix, &["home.local"]);
assert_eq!(config.forwarding[0].upstream, vec!["100.90.1.63:5361"]);
}
#[test]
fn forwarding_parses_reverse_dns_zone() {
let toml = r#"
[[forwarding]]
suffix = "168.192.in-addr.arpa"
upstream = "100.90.1.63:5361"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert_eq!(config.forwarding.len(), 1);
assert_eq!(config.forwarding[0].suffix, &["168.192.in-addr.arpa"]);
}
#[test]
fn forwarding_parses_multiple_rules() {
let toml = r#"
[[forwarding]]
suffix = "168.192.in-addr.arpa"
upstream = "100.90.1.63:5361"
[[forwarding]]
suffix = "home.local"
upstream = "10.0.0.1"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert_eq!(config.forwarding.len(), 2);
assert_eq!(config.forwarding[1].upstream, vec!["10.0.0.1"]);
}
#[test]
fn forwarding_parses_suffix_array() {
let toml = r#"
[[forwarding]]
suffix = ["168.192.in-addr.arpa", "onsite"]
upstream = "192.168.88.1"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert_eq!(config.forwarding.len(), 1);
assert_eq!(
config.forwarding[0].suffix,
&["168.192.in-addr.arpa", "onsite"]
);
}
#[test]
fn forwarding_suffix_array_expands_to_multiple_runtime_rules() {
let rule = ForwardingRuleConfig {
suffix: vec!["168.192.in-addr.arpa".to_string(), "onsite".to_string()],
upstream: vec!["192.168.88.1".to_string()],
};
let runtime = rule.to_runtime_rules().unwrap();
assert_eq!(runtime.len(), 2);
assert_eq!(runtime[0].suffix, "168.192.in-addr.arpa");
assert_eq!(runtime[1].suffix, "onsite");
assert_eq!(
runtime[0].upstream.preferred(),
runtime[1].upstream.preferred()
);
}
#[test]
fn forwarding_upstream_with_explicit_port() {
let rule = ForwardingRuleConfig {
suffix: vec!["home.local".to_string()],
upstream: vec!["100.90.1.63:5361".to_string()],
};
let runtime = rule.to_runtime_rules().unwrap();
assert_eq!(runtime.len(), 1);
let preferred = runtime[0].upstream.preferred().unwrap();
assert!(matches!(preferred, crate::forward::Upstream::Udp(_)));
assert_eq!(preferred.to_string(), "100.90.1.63:5361");
assert_eq!(runtime[0].suffix, "home.local");
}
#[test]
fn forwarding_upstream_defaults_to_port_53() {
let rule = ForwardingRuleConfig {
suffix: vec!["home.local".to_string()],
upstream: vec!["100.90.1.63".to_string()],
};
let runtime = rule.to_runtime_rules().unwrap();
assert_eq!(
runtime[0].upstream.preferred().unwrap().to_string(),
"100.90.1.63:53"
);
}
#[test]
fn forwarding_invalid_upstream_returns_error() {
let rule = ForwardingRuleConfig {
suffix: vec!["home.local".to_string()],
upstream: vec!["not-a-valid-host".to_string()],
};
assert!(rule.to_runtime_rules().is_err());
}
#[test]
fn forwarding_upstream_accepts_dot_scheme() {
let rule = ForwardingRuleConfig {
suffix: vec!["google.com".to_string()],
upstream: vec!["tls://9.9.9.9#dns.quad9.net".to_string()],
};
let runtime = rule
.to_runtime_rules()
.expect("tls:// upstream should parse");
assert_eq!(runtime.len(), 1);
assert_eq!(
runtime[0].upstream.preferred().unwrap().to_string(),
"tls://9.9.9.9:853#dns.quad9.net"
);
}
#[test]
fn forwarding_upstream_accepts_doh_scheme() {
let rule = ForwardingRuleConfig {
suffix: vec!["goog".to_string()],
upstream: vec!["https://dns.quad9.net/dns-query".to_string()],
};
let runtime = rule
.to_runtime_rules()
.expect("https:// upstream should parse");
assert_eq!(runtime.len(), 1);
assert_eq!(
runtime[0].upstream.preferred().unwrap().to_string(),
"https://dns.quad9.net/dns-query"
);
}
#[test]
fn forwarding_config_rules_take_precedence_over_discovered() {
let config_rules = vec![ForwardingRuleConfig {
suffix: vec!["home.local".to_string()],
upstream: vec!["10.0.0.1:53".to_string()],
}];
let discovered = vec![crate::system_dns::ForwardingRule::new(
"home.local".to_string(),
crate::forward::UpstreamPool::new(
vec![crate::forward::Upstream::Udp(
"192.168.1.1:53".parse().unwrap(),
)],
vec![],
),
)];
let merged = merge_forwarding_rules(&config_rules, discovered).unwrap();
let picked = crate::system_dns::match_forwarding_rule("host.home.local", &merged)
.expect("rule should match");
assert_eq!(picked.preferred().unwrap().to_string(), "10.0.0.1:53");
}
#[test]
fn forwarding_merge_preserves_non_overlapping_discovered() {
let config_rules = vec![ForwardingRuleConfig {
suffix: vec!["home.local".to_string()],
upstream: vec!["10.0.0.1:53".to_string()],
}];
let discovered = vec![crate::system_dns::ForwardingRule::new(
"corp.example".to_string(),
crate::forward::UpstreamPool::new(
vec![crate::forward::Upstream::Udp(
"192.168.1.1:53".parse().unwrap(),
)],
vec![],
),
)];
let merged = merge_forwarding_rules(&config_rules, discovered).unwrap();
assert_eq!(merged.len(), 2);
let picked = crate::system_dns::match_forwarding_rule("host.corp.example", &merged)
.expect("discovered rule should still match");
assert_eq!(picked.preferred().unwrap().to_string(), "192.168.1.1:53");
}
#[test]
fn forwarding_merge_suffix_array_expands_to_multiple_rules() {
let config_rules = vec![ForwardingRuleConfig {
suffix: vec!["a.local".to_string(), "b.local".to_string()],
upstream: vec!["10.0.0.1:53".to_string()],
}];
let merged = merge_forwarding_rules(&config_rules, vec![]).unwrap();
assert_eq!(merged.len(), 2);
}
#[test]
fn forwarding_parses_upstream_array() {
let toml = r#"
[[forwarding]]
suffix = "google.com"
upstream = ["tls://9.9.9.9#dns.quad9.net", "tls://149.112.112.112#dns.quad9.net"]
"#;
let config: Config = toml::from_str(toml).unwrap();
assert_eq!(config.forwarding.len(), 1);
assert_eq!(config.forwarding[0].upstream.len(), 2);
}
#[test]
fn forwarding_upstream_array_builds_pool_with_multiple_primaries() {
let rule = ForwardingRuleConfig {
suffix: vec!["google.com".to_string()],
upstream: vec![
"tls://9.9.9.9#dns.quad9.net".to_string(),
"tls://149.112.112.112#dns.quad9.net".to_string(),
],
};
let runtime = rule.to_runtime_rules().unwrap();
assert_eq!(runtime.len(), 1);
let label = runtime[0].upstream.label();
assert!(label.contains("+1 more"), "label was: {}", label);
}
#[test]
fn forwarding_empty_upstream_array_errors() {
let rule = ForwardingRuleConfig {
suffix: vec!["home.local".to_string()],
upstream: vec![],
};
assert!(rule.to_runtime_rules().is_err());
}
}
pub struct ConfigLoad {
@@ -503,6 +1282,13 @@ pub fn load_config(path: &str) -> Result<ConfigLoad> {
let filename = p.file_name().unwrap_or(p.as_os_str());
v.push(crate::config_dir().join(filename));
v.push(crate::data_dir().join(filename));
// Interactive root and sudo'd users: always consult the XDG path
// so `touch ~/.config/numa/numa.toml` works regardless of whether
// config_dir() routed to FHS (issue #81).
let suggested = crate::suggested_config_path();
if !v.contains(&suggested) {
v.push(suggested);
}
}
v
};
@@ -523,11 +1309,7 @@ pub fn load_config(path: &str) -> Result<ConfigLoad> {
}
}
// Show config_dir candidate as the "expected" path — it's actionable
let display_path = candidates
.get(1)
.map(|p| p.to_string_lossy().to_string())
.unwrap_or_else(|| resolve_path(path));
let display_path = crate::suggested_config_path().to_string_lossy().to_string();
log::info!("config not found, using defaults (create {})", display_path);
Ok(ConfigLoad {
config: Config::default(),

View File

@@ -1,7 +1,7 @@
use std::collections::HashMap;
use std::collections::{HashMap, HashSet};
use std::net::SocketAddr;
use std::path::PathBuf;
use std::sync::{Mutex, RwLock};
use std::sync::{Arc, Mutex, RwLock};
use std::time::{Duration, Instant, SystemTime};
use arc_swap::ArcSwap;
@@ -16,8 +16,11 @@ use crate::blocklist::BlocklistStore;
use crate::buffer::BytePacketBuffer;
use crate::cache::{DnsCache, DnssecStatus};
use crate::config::{UpstreamMode, ZoneMap};
use crate::forward::{forward_query, Upstream};
#[cfg(test)]
use crate::forward::Upstream;
use crate::forward::{forward_with_failover_raw, UpstreamPool};
use crate::header::ResultCode;
use crate::health::HealthMeta;
use crate::lan::PeerStore;
use crate::override_store::OverrideStore;
use crate::packet::DnsPacket;
@@ -26,7 +29,7 @@ use crate::question::QueryType;
use crate::record::DnsRecord;
use crate::service_store::ServiceStore;
use crate::srtt::SrttCache;
use crate::stats::{QueryPath, ServerStats};
use crate::stats::{QueryPath, ServerStats, Transport};
use crate::system_dns::ForwardingRule;
pub struct ServerCtx {
@@ -34,6 +37,8 @@ pub struct ServerCtx {
pub zone_map: ZoneMap,
/// std::sync::RwLock (not tokio) — locks must never be held across .await points.
pub cache: RwLock<DnsCache>,
/// Domains currently being refreshed in the background (dedup guard).
pub refreshing: Mutex<HashSet<(String, QueryType)>>,
pub stats: Mutex<ServerStats>,
pub overrides: RwLock<OverrideStore>,
pub blocklist: RwLock<BlocklistStore>,
@@ -41,11 +46,12 @@ pub struct ServerCtx {
pub services: Mutex<ServiceStore>,
pub lan_peers: Mutex<PeerStore>,
pub forwarding_rules: Vec<ForwardingRule>,
pub upstream: Mutex<Upstream>,
pub upstream_pool: Mutex<UpstreamPool>,
pub upstream_auto: bool,
pub upstream_port: u16,
pub lan_ip: Mutex<std::net::Ipv4Addr>,
pub timeout: Duration,
pub hedge_delay: Duration,
pub proxy_tld: String,
pub proxy_tld_suffix: String, // pre-computed ".{tld}" to avoid per-query allocation
pub lan_enabled: bool,
@@ -60,6 +66,21 @@ pub struct ServerCtx {
pub inflight: Mutex<InflightMap>,
pub dnssec_enabled: bool,
pub dnssec_strict: bool,
/// Cached health metadata (version, hostname, DoT config, CA
/// fingerprint, features). Shared between the main and mobile
/// API `/health` handlers. Built once at startup in `main.rs`.
pub health_meta: HealthMeta,
/// CA certificate in PEM form, cached at startup. `None` if no
/// TLS-using feature is enabled and the CA hasn't been generated.
/// Used by `/ca.pem`, `/mobileconfig`, and `/ca.mobileconfig`
/// handlers to avoid per-request disk I/O on the hot path.
pub ca_pem: Option<String>,
pub mobile_enabled: bool,
pub mobile_port: u16,
/// When true, AAAA queries short-circuit with NODATA (NOERROR + empty
/// answer) instead of hitting cache/forwarding/upstream. Local data
/// (overrides, zones, .numa proxy, blocklist sinkhole) is unaffected.
pub filter_aaaa: bool,
}
/// Transport-agnostic DNS resolution. Runs the full pipeline (overrides, blocklist,
@@ -69,9 +90,11 @@ pub struct ServerCtx {
/// (and logging parse errors) before calling this function.
pub async fn resolve_query(
query: DnsPacket,
raw_wire: &[u8],
src_addr: SocketAddr,
ctx: &ServerCtx,
) -> crate::Result<BytePacketBuffer> {
ctx: &Arc<ServerCtx>,
transport: Transport,
) -> crate::Result<(BytePacketBuffer, QueryPath)> {
let start = Instant::now();
let (qname, qtype) = match query.questions.first() {
@@ -79,8 +102,10 @@ pub async fn resolve_query(
None => return Err("empty question section".into()),
};
// Pipeline: overrides -> .tld interception -> blocklist -> local zones -> cache -> upstream
// Pipeline: overrides -> .localhost -> local zones -> special-use (unless forwarded)
// -> .tld proxy -> blocklist -> cache -> forwarding -> recursive/upstream
// Each lock is scoped to avoid holding MutexGuard across await points.
let mut upstream_transport: Option<crate::stats::UpstreamTransport> = None;
let (response, path, dnssec) = {
let override_record = ctx.overrides.read().unwrap().lookup(&qname);
if let Some(record) = override_record {
@@ -98,8 +123,14 @@ pub async fn resolve_query(
300,
));
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else if is_special_use_domain(&qname) {
// RFC 6761/8880: private PTR, DDR, NAT64 — answer locally
} else if let Some(records) = ctx.zone_map.get(qname.as_str()).and_then(|m| m.get(&qtype)) {
let mut resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
resp.answers = records.clone();
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else if is_special_use_domain(&qname)
&& crate::system_dns::match_forwarding_rule(&qname, &ctx.forwarding_rules).is_none()
{
// RFC 6761/8880: answer locally unless a forwarding rule covers this zone.
let resp = special_use_response(&query, &qname, qtype);
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else if !ctx.proxy_tld_suffix.is_empty()
@@ -146,30 +177,59 @@ pub async fn resolve_query(
60,
));
(resp, QueryPath::Blocked, DnssecStatus::Indeterminate)
} else if let Some(records) = ctx.zone_map.get(qname.as_str()).and_then(|m| m.get(&qtype)) {
let mut resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
resp.answers = records.clone();
} else if qtype == QueryType::AAAA && ctx.filter_aaaa {
// RFC 2308 NODATA: NOERROR with empty answer section. Prevents
// Happy Eyeballs clients from waiting on an AAAA they'll never use
// on IPv4-only networks. NXDOMAIN would be wrong (it'd imply the
// name doesn't exist for A either).
let resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else {
let cached = ctx.cache.read().unwrap().lookup_with_status(&qname, qtype);
if let Some((cached, cached_dnssec)) = cached {
if let Some((cached, cached_dnssec, freshness)) = cached {
if freshness.needs_refresh() {
let key = (qname.clone(), qtype);
let already = !ctx.refreshing.lock().unwrap().insert(key.clone());
if !already {
let ctx = Arc::clone(ctx);
tokio::spawn(async move {
refresh_entry(&ctx, &key.0, key.1).await;
ctx.refreshing.lock().unwrap().remove(&key);
});
}
}
let mut resp = cached;
resp.header.id = query.header.id;
if cached_dnssec == DnssecStatus::Secure {
resp.header.authed_data = true;
}
(resp, QueryPath::Cached, cached_dnssec)
} else if let Some(fwd_addr) =
} else if let Some(pool) =
crate::system_dns::match_forwarding_rule(&qname, &ctx.forwarding_rules)
{
// Conditional forwarding takes priority over recursive mode
// (e.g. Tailscale .ts.net, VPC private zones)
let upstream = Upstream::Udp(fwd_addr);
match forward_query(&query, &upstream, ctx.timeout).await {
Ok(resp) => {
ctx.cache.write().unwrap().insert(&qname, qtype, &resp);
(resp, QueryPath::Forwarded, DnssecStatus::Indeterminate)
}
upstream_transport = pool.preferred().map(|u| u.transport());
match forward_with_failover_raw(
raw_wire,
pool,
&ctx.srtt,
ctx.timeout,
ctx.hedge_delay,
)
.await
{
Ok(resp_wire) => match cache_and_parse(ctx, &qname, qtype, &resp_wire) {
Ok(resp) => (resp, QueryPath::Forwarded, DnssecStatus::Indeterminate),
Err(e) => {
error!("{} | {:?} {} | PARSE ERROR | {}", src_addr, qtype, qname, e);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
},
Err(e) => {
error!(
"{} | {:?} {} | FORWARD ERROR | {}",
@@ -183,6 +243,9 @@ pub async fn resolve_query(
}
}
} else if ctx.upstream_mode == UpstreamMode::Recursive {
// Recursive resolution makes UDP hops to roots/TLDs/auths;
// tag as Udp so the dashboard can aggregate plaintext-wire
// egress honestly. Only mark on success — errors stay None.
let key = (qname.clone(), qtype);
let (resp, path, err) = resolve_coalesced(&ctx.inflight, key, &query, || {
crate::recursive::resolve_recursive(
@@ -205,19 +268,35 @@ pub async fn resolve_query(
qname,
err.as_deref().unwrap_or("leader failed")
);
} else {
upstream_transport = Some(crate::stats::UpstreamTransport::Udp);
}
(resp, path, DnssecStatus::Indeterminate)
} else {
let upstream =
match crate::system_dns::match_forwarding_rule(&qname, &ctx.forwarding_rules) {
Some(addr) => Upstream::Udp(addr),
None => ctx.upstream.lock().unwrap().clone(),
};
match forward_query(&query, &upstream, ctx.timeout).await {
Ok(resp) => {
ctx.cache.write().unwrap().insert(&qname, qtype, &resp);
(resp, QueryPath::Forwarded, DnssecStatus::Indeterminate)
}
let pool = ctx.upstream_pool.lock().unwrap().clone();
match forward_with_failover_raw(
raw_wire,
&pool,
&ctx.srtt,
ctx.timeout,
ctx.hedge_delay,
)
.await
{
Ok(resp_wire) => match cache_and_parse(ctx, &qname, qtype, &resp_wire) {
Ok(resp) => {
upstream_transport = pool.preferred().map(|u| u.transport());
(resp, QueryPath::Upstream, DnssecStatus::Indeterminate)
}
Err(e) => {
error!("{} | {:?} {} | PARSE ERROR | {}", src_addr, qtype, qname, e);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
},
Err(e) => {
error!(
"{} | {:?} {} | UPSTREAM ERROR | {}",
@@ -276,6 +355,15 @@ pub async fn resolve_query(
strip_dnssec_records(&mut response);
}
// filter_aaaa: also strip ipv6hint from HTTPS/SVCB answers so modern
// browsers (Chrome ≥103 etc.) don't receive v6 address hints via the
// HTTPS record path that bypasses AAAA entirely. Gated on !client_do
// because modifying rdata invalidates any accompanying RRSIG — a DO-bit
// validator downstream would reject the response as Bogus.
if ctx.filter_aaaa && !client_do {
strip_svcb_ipv6_hints(&mut response);
}
// Echo EDNS back if client sent it
if query.edns.is_some() {
response.edns = Some(crate::packet::EdnsOpt {
@@ -319,7 +407,7 @@ pub async fn resolve_query(
// Record stats and query log
{
let mut s = ctx.stats.lock().unwrap();
let total = s.record(path);
let total = s.record(path, transport, upstream_transport);
if total.is_multiple_of(1000) {
s.log_summary();
}
@@ -331,19 +419,76 @@ pub async fn resolve_query(
domain: qname,
query_type: qtype,
path,
transport,
rescode: response.header.rescode,
latency_us: elapsed.as_micros() as u64,
dnssec,
});
Ok(resp_buffer)
Ok((resp_buffer, path))
}
fn cache_and_parse(
ctx: &ServerCtx,
qname: &str,
qtype: QueryType,
resp_wire: &[u8],
) -> crate::Result<DnsPacket> {
ctx.cache
.write()
.unwrap()
.insert_wire(qname, qtype, resp_wire, DnssecStatus::Indeterminate);
let mut buf = BytePacketBuffer::from_bytes(resp_wire);
DnsPacket::from_buffer(&mut buf)
}
/// Re-resolve a single (domain, qtype) and update the cache.
/// Used for both stale-entry refresh and proactive cache warming.
pub async fn refresh_entry(ctx: &ServerCtx, qname: &str, qtype: QueryType) {
let query = DnsPacket::query(0, qname, qtype);
if ctx.upstream_mode == UpstreamMode::Recursive {
if let Ok(resp) = crate::recursive::resolve_recursive(
qname,
qtype,
&ctx.cache,
&query,
&ctx.root_hints,
&ctx.srtt,
)
.await
{
ctx.cache.write().unwrap().insert(qname, qtype, &resp);
}
} else {
let mut buf = BytePacketBuffer::new();
if query.write(&mut buf).is_ok() {
let pool = ctx.upstream_pool.lock().unwrap().clone();
if let Ok(wire) = forward_with_failover_raw(
buf.filled(),
&pool,
&ctx.srtt,
ctx.timeout,
ctx.hedge_delay,
)
.await
{
ctx.cache.write().unwrap().insert_wire(
qname,
qtype,
&wire,
DnssecStatus::Indeterminate,
);
}
}
}
}
/// Handle a DNS query received over UDP. Thin wrapper around resolve_query.
pub async fn handle_query(
mut buffer: BytePacketBuffer,
raw_len: usize,
src_addr: SocketAddr,
ctx: &ServerCtx,
ctx: &Arc<ServerCtx>,
transport: Transport,
) -> crate::Result<()> {
let query = match DnsPacket::from_buffer(&mut buffer) {
Ok(packet) => packet,
@@ -352,8 +497,8 @@ pub async fn handle_query(
return Ok(());
}
};
match resolve_query(query, src_addr, ctx).await {
Ok(resp_buffer) => {
match resolve_query(query, &buffer.buf[..raw_len], src_addr, ctx, transport).await {
Ok((resp_buffer, _)) => {
ctx.socket.send_to(resp_buffer.filled(), src_addr).await?;
}
Err(e) => {
@@ -376,6 +521,20 @@ fn strip_dnssec_records(pkt: &mut DnsPacket) {
pkt.resources.retain(|r| !is_dnssec_record(r));
}
fn strip_svcb_ipv6_hints(pkt: &mut DnsPacket) {
let https_qtype = QueryType::HTTPS.to_num();
let svcb_qtype = QueryType::SVCB.to_num();
pkt.for_each_record_mut(|rec| {
if let DnsRecord::UNKNOWN { qtype, data, .. } = rec {
if *qtype == https_qtype || *qtype == svcb_qtype {
if let Some(new_data) = crate::svcb::strip_ipv6hint(data) {
*data = new_data;
}
}
}
});
}
fn is_special_use_domain(qname: &str) -> bool {
if qname.ends_with(".in-addr.arpa") {
// RFC 6303: private + loopback + link-local reverse DNS
@@ -932,4 +1091,470 @@ mod tests {
"error message must be preserved for logging"
);
}
// ---- Full-pipeline resolve_query tests ----
/// Send a query through the full resolve_query pipeline and return
/// the parsed response + query path.
async fn resolve_in_test(
ctx: &Arc<ServerCtx>,
domain: &str,
qtype: QueryType,
) -> (DnsPacket, QueryPath) {
let query = DnsPacket::query(0xBEEF, domain, qtype);
let mut buf = BytePacketBuffer::new();
query.write(&mut buf).unwrap();
let raw = &buf.buf[..buf.pos];
let src: SocketAddr = "127.0.0.1:1234".parse().unwrap();
let (resp_buf, path) = resolve_query(query, raw, src, ctx, Transport::Udp)
.await
.unwrap();
let mut resp_parse_buf = BytePacketBuffer::from_bytes(resp_buf.filled());
let resp = DnsPacket::from_buffer(&mut resp_parse_buf).unwrap();
(resp, path)
}
#[tokio::test]
async fn special_use_private_ptr_returns_nxdomain() {
let ctx = Arc::new(crate::testutil::test_ctx().await);
let (resp, path) =
resolve_in_test(&ctx, "153.188.168.192.in-addr.arpa", QueryType::PTR).await;
assert_eq!(path, QueryPath::Local);
assert_eq!(resp.header.rescode, ResultCode::NXDOMAIN);
}
#[tokio::test]
async fn forwarding_rule_overrides_special_use_domain() {
let mut resp = DnsPacket::new();
resp.header.response = true;
resp.header.rescode = ResultCode::NOERROR;
let upstream_addr = crate::testutil::mock_upstream(resp).await;
let mut ctx = crate::testutil::test_ctx().await;
ctx.forwarding_rules = vec![ForwardingRule::new(
"168.192.in-addr.arpa".to_string(),
UpstreamPool::new(vec![Upstream::Udp(upstream_addr)], vec![]),
)];
let ctx = Arc::new(ctx);
let (resp, path) =
resolve_in_test(&ctx, "153.188.168.192.in-addr.arpa", QueryType::PTR).await;
assert_eq!(
path,
QueryPath::Forwarded,
"forwarding rule must take precedence over special-use NXDOMAIN"
);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
}
#[tokio::test]
async fn pipeline_override_takes_precedence() {
let ctx = crate::testutil::test_ctx().await;
ctx.overrides
.write()
.unwrap()
.insert("override.test", "1.2.3.4", 60, None)
.unwrap();
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "override.test", QueryType::A).await;
assert_eq!(path, QueryPath::Overridden);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
assert_eq!(resp.answers.len(), 1);
}
#[tokio::test]
async fn pipeline_localhost_resolves_to_loopback() {
let ctx = Arc::new(crate::testutil::test_ctx().await);
let (resp, path) = resolve_in_test(&ctx, "localhost", QueryType::A).await;
assert_eq!(path, QueryPath::Local);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::LOCALHOST),
other => panic!("expected A record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_localhost_subdomain_resolves_to_loopback() {
let ctx = Arc::new(crate::testutil::test_ctx().await);
let (resp, path) = resolve_in_test(&ctx, "app.localhost", QueryType::A).await;
assert_eq!(path, QueryPath::Local);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::LOCALHOST),
other => panic!("expected A record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_local_zone_returns_configured_record() {
let mut ctx = crate::testutil::test_ctx().await;
let mut inner = HashMap::new();
inner.insert(
QueryType::A,
vec![DnsRecord::A {
domain: "myapp.test".to_string(),
addr: Ipv4Addr::new(10, 0, 0, 42),
ttl: 300,
}],
);
ctx.zone_map.insert("myapp.test".to_string(), inner);
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "myapp.test", QueryType::A).await;
assert_eq!(path, QueryPath::Local);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::new(10, 0, 0, 42)),
other => panic!("expected A record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_tld_proxy_resolves_service() {
let ctx = crate::testutil::test_ctx().await;
ctx.services.lock().unwrap().insert("grafana", 3000);
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "grafana.numa", QueryType::A).await;
assert_eq!(path, QueryPath::Local);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::LOCALHOST),
other => panic!("expected A record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_filter_aaaa_returns_nodata() {
let mut ctx = crate::testutil::test_ctx().await;
ctx.filter_aaaa = true;
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "example.com", QueryType::AAAA).await;
assert_eq!(path, QueryPath::Local);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
assert!(resp.answers.is_empty(), "AAAA must be filtered to NODATA");
}
#[tokio::test]
async fn pipeline_filter_aaaa_leaves_a_queries_alone() {
let mut upstream_resp = DnsPacket::new();
upstream_resp.header.response = true;
upstream_resp.header.rescode = ResultCode::NOERROR;
upstream_resp.answers.push(DnsRecord::A {
domain: "example.com".to_string(),
addr: Ipv4Addr::new(93, 184, 216, 34),
ttl: 300,
});
let upstream_addr = crate::testutil::mock_upstream(upstream_resp).await;
let mut ctx = crate::testutil::test_ctx().await;
ctx.filter_aaaa = true;
ctx.upstream_pool
.lock()
.unwrap()
.set_primary(vec![Upstream::Udp(upstream_addr)]);
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "example.com", QueryType::A).await;
assert_eq!(path, QueryPath::Upstream);
assert_eq!(resp.answers.len(), 1);
}
#[tokio::test]
async fn pipeline_filter_aaaa_respects_override() {
let mut ctx = crate::testutil::test_ctx().await;
ctx.filter_aaaa = true;
ctx.overrides
.write()
.unwrap()
.insert("v6.test", "2001:db8::1", 60, None)
.unwrap();
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "v6.test", QueryType::AAAA).await;
assert_eq!(path, QueryPath::Overridden);
assert_eq!(resp.answers.len(), 1, "override must win over filter");
}
#[tokio::test]
async fn pipeline_filter_aaaa_strips_ipv6hint_from_https_and_svcb() {
let rdata = crate::svcb::build_rdata(
1,
&[],
&[
(1, vec![0x02, b'h', b'3']),
(
6,
vec![
0x26, 0x06, 0x47, 0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x01,
],
),
],
);
let mut pkt = DnsPacket::new();
pkt.header.response = true;
pkt.header.rescode = ResultCode::NOERROR;
pkt.questions.push(crate::question::DnsQuestion {
name: "hints.test".to_string(),
qtype: QueryType::HTTPS,
});
pkt.answers.push(DnsRecord::UNKNOWN {
domain: "hints.test".to_string(),
qtype: 65,
data: rdata.clone(),
ttl: 300,
});
let mut svcb_pkt = pkt.clone();
svcb_pkt.questions[0].name = "svc.test".to_string();
svcb_pkt.questions[0].qtype = QueryType::SVCB;
if let DnsRecord::UNKNOWN { domain, qtype, .. } = &mut svcb_pkt.answers[0] {
*domain = "svc.test".to_string();
*qtype = 64;
}
let mut ctx = crate::testutil::test_ctx().await;
ctx.filter_aaaa = true;
ctx.cache
.write()
.unwrap()
.insert("hints.test", QueryType::HTTPS, &pkt);
ctx.cache
.write()
.unwrap()
.insert("svc.test", QueryType::SVCB, &svcb_pkt);
let ctx = Arc::new(ctx);
for (name, qtype, label) in [
("hints.test", QueryType::HTTPS, "HTTPS"),
("svc.test", QueryType::SVCB, "SVCB"),
] {
let (resp, path) = resolve_in_test(&ctx, name, qtype).await;
assert_eq!(path, QueryPath::Cached, "{label}");
assert_eq!(resp.answers.len(), 1, "{label}");
match &resp.answers[0] {
DnsRecord::UNKNOWN { data, .. } => {
assert!(
data.len() < rdata.len(),
"{label}: ipv6hint (20 bytes) must be removed"
);
// Bytes for key=6 must not appear at any 4-byte boundary in the
// params section — cheap structural check.
assert!(
!data.windows(4).any(|w| w == [0, 6, 0, 16]),
"{label}: ipv6hint TLV header must be absent"
);
}
other => panic!("{label}: expected UNKNOWN record, got {other:?}"),
}
}
}
#[tokio::test]
async fn pipeline_filter_aaaa_preserves_ipv6hint_for_dnssec_clients() {
// Regression guard for the DO-bit gate in resolve_query: modifying
// HTTPS rdata invalidates any accompanying RRSIG, so a DO=1 client
// must receive the record untouched even when filter_aaaa is on.
let rdata = crate::svcb::build_rdata(
1,
&[],
&[(
6,
vec![
0x26, 0x06, 0x47, 0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x01,
],
)],
);
let mut pkt = DnsPacket::new();
pkt.header.response = true;
pkt.header.rescode = ResultCode::NOERROR;
pkt.questions.push(crate::question::DnsQuestion {
name: "hints.test".to_string(),
qtype: QueryType::HTTPS,
});
pkt.answers.push(DnsRecord::UNKNOWN {
domain: "hints.test".to_string(),
qtype: 65,
data: rdata.clone(),
ttl: 300,
});
let mut ctx = crate::testutil::test_ctx().await;
ctx.filter_aaaa = true;
ctx.cache
.write()
.unwrap()
.insert("hints.test", QueryType::HTTPS, &pkt);
let ctx = Arc::new(ctx);
// Build a query with EDNS DO bit set — can't use resolve_in_test
// because it constructs a plain query without EDNS.
let mut query = DnsPacket::query(0xBEEF, "hints.test", QueryType::HTTPS);
query.edns = Some(crate::packet::EdnsOpt {
do_bit: true,
..Default::default()
});
let mut buf = BytePacketBuffer::new();
query.write(&mut buf).unwrap();
let raw = &buf.buf[..buf.pos];
let src: SocketAddr = "127.0.0.1:1234".parse().unwrap();
let (resp_buf, _) = resolve_query(query, raw, src, &ctx, Transport::Udp)
.await
.unwrap();
let mut resp_parse_buf = BytePacketBuffer::from_bytes(resp_buf.filled());
let resp = DnsPacket::from_buffer(&mut resp_parse_buf).unwrap();
match &resp.answers[0] {
DnsRecord::UNKNOWN { data, .. } => {
assert_eq!(
data, &rdata,
"ipv6hint must be preserved for DO-bit clients"
);
}
other => panic!("expected UNKNOWN record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_blocklist_sinkhole() {
let ctx = crate::testutil::test_ctx().await;
let mut domains = std::collections::HashSet::new();
domains.insert("ads.tracker.test".to_string());
ctx.blocklist.write().unwrap().swap_domains(domains, vec![]);
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "ads.tracker.test", QueryType::A).await;
assert_eq!(path, QueryPath::Blocked);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::UNSPECIFIED),
other => panic!("expected sinkhole A record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_cache_hit() {
let ctx = Arc::new(crate::testutil::test_ctx().await);
// Pre-populate cache with a response
let mut pkt = DnsPacket::new();
pkt.header.response = true;
pkt.header.rescode = ResultCode::NOERROR;
pkt.questions.push(crate::question::DnsQuestion {
name: "cached.test".to_string(),
qtype: QueryType::A,
});
pkt.answers.push(DnsRecord::A {
domain: "cached.test".to_string(),
addr: Ipv4Addr::new(5, 5, 5, 5),
ttl: 3600,
});
ctx.cache
.write()
.unwrap()
.insert("cached.test", QueryType::A, &pkt);
let (resp, path) = resolve_in_test(&ctx, "cached.test", QueryType::A).await;
assert_eq!(path, QueryPath::Cached);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
}
#[tokio::test]
async fn pipeline_forwarding_returns_upstream_answer() {
let mut upstream_resp = DnsPacket::new();
upstream_resp.header.response = true;
upstream_resp.header.rescode = ResultCode::NOERROR;
upstream_resp.answers.push(DnsRecord::A {
domain: "internal.corp".to_string(),
addr: Ipv4Addr::new(10, 1, 2, 3),
ttl: 600,
});
let upstream_addr = crate::testutil::mock_upstream(upstream_resp).await;
let mut ctx = crate::testutil::test_ctx().await;
ctx.forwarding_rules = vec![ForwardingRule::new(
"corp".to_string(),
UpstreamPool::new(vec![Upstream::Udp(upstream_addr)], vec![]),
)];
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "internal.corp", QueryType::A).await;
assert_eq!(path, QueryPath::Forwarded);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
assert_eq!(resp.answers.len(), 1);
match &resp.answers[0] {
DnsRecord::A { domain, addr, .. } => {
assert_eq!(domain, "internal.corp");
assert_eq!(*addr, Ipv4Addr::new(10, 1, 2, 3));
}
other => panic!("expected A record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_forwarding_fails_over_to_second_upstream() {
let dead = crate::testutil::blackhole_upstream();
let mut live_resp = DnsPacket::new();
live_resp.header.response = true;
live_resp.header.rescode = ResultCode::NOERROR;
live_resp.answers.push(DnsRecord::A {
domain: "internal.corp".to_string(),
addr: Ipv4Addr::new(10, 9, 9, 9),
ttl: 600,
});
let live = crate::testutil::mock_upstream(live_resp).await;
let mut ctx = crate::testutil::test_ctx().await;
ctx.forwarding_rules = vec![ForwardingRule::new(
"corp".to_string(),
UpstreamPool::new(vec![Upstream::Udp(dead), Upstream::Udp(live)], vec![]),
)];
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "internal.corp", QueryType::A).await;
assert_eq!(path, QueryPath::Forwarded);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
assert_eq!(resp.answers.len(), 1);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::new(10, 9, 9, 9)),
other => panic!("expected A record, got {:?}", other),
}
}
#[tokio::test]
async fn pipeline_default_pool_reports_upstream_path() {
let mut upstream_resp = DnsPacket::new();
upstream_resp.header.response = true;
upstream_resp.header.rescode = ResultCode::NOERROR;
upstream_resp.answers.push(DnsRecord::A {
domain: "example.com".to_string(),
addr: Ipv4Addr::new(93, 184, 216, 34),
ttl: 300,
});
let upstream_addr = crate::testutil::mock_upstream(upstream_resp).await;
let ctx = crate::testutil::test_ctx().await;
ctx.upstream_pool
.lock()
.unwrap()
.set_primary(vec![Upstream::Udp(upstream_addr)]);
let ctx = Arc::new(ctx);
let (resp, path) = resolve_in_test(&ctx, "example.com", QueryType::A).await;
assert_eq!(path, QueryPath::Upstream);
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
assert_eq!(resp.answers.len(), 1);
}
}

224
src/doh.rs Normal file
View File

@@ -0,0 +1,224 @@
use std::net::SocketAddr;
use axum::body::Bytes;
use axum::extract::{Request, State};
use axum::response::{IntoResponse, Response};
use hyper::StatusCode;
use log::warn;
use crate::buffer::BytePacketBuffer;
use crate::ctx::{resolve_query, ServerCtx};
use crate::header::ResultCode;
use crate::packet::DnsPacket;
use crate::stats::Transport;
const MAX_DNS_MSG: usize = 4096;
const DOH_CONTENT_TYPE: &str = "application/dns-message";
pub async fn doh_post(State(state): State<super::proxy::DohState>, req: Request) -> Response {
let host = super::proxy::extract_host(&req);
if !is_doh_host(host.as_deref(), &state.ctx.proxy_tld) {
return StatusCode::NOT_FOUND.into_response();
}
let content_type = req
.headers()
.get(hyper::header::CONTENT_TYPE)
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if !content_type.starts_with(DOH_CONTENT_TYPE) {
return StatusCode::UNSUPPORTED_MEDIA_TYPE.into_response();
}
let body = match axum::body::to_bytes(req.into_body(), MAX_DNS_MSG).await {
Ok(b) => b,
Err(_) => {
return (StatusCode::PAYLOAD_TOO_LARGE, "body exceeds 4096 bytes").into_response()
}
};
if body.is_empty() {
return (StatusCode::BAD_REQUEST, "empty body").into_response();
}
let src = state
.remote_addr
.unwrap_or_else(|| SocketAddr::from(([127, 0, 0, 1], 0)));
resolve_doh(&body, src, &state.ctx).await
}
fn is_doh_host(host: Option<&str>, tld: &str) -> bool {
let h = match host {
Some(h) => h,
None => return false,
};
let base = strip_port(h).unwrap_or(h);
is_loopback_host(base) || is_tld_match(base, tld)
}
fn strip_port(h: &str) -> Option<&str> {
if h.starts_with('[') {
// [::1]:443 → [::1]
let (base, port) = h.rsplit_once("]:")?;
port.bytes()
.all(|b| b.is_ascii_digit())
.then(|| &h[..base.len() + 1])
} else {
let (base, port) = h.rsplit_once(':')?;
// Bare IPv6 like "::1" has multiple colons — not a port suffix
if base.contains(':') {
return None;
}
port.bytes().all(|b| b.is_ascii_digit()).then_some(base)
}
}
fn is_loopback_host(h: &str) -> bool {
matches!(h, "127.0.0.1" | "::1" | "[::1]" | "localhost")
}
fn is_tld_match(h: &str, tld: &str) -> bool {
h == tld
|| (h.len() == 2 * tld.len() + 1
&& h.starts_with(tld)
&& h.as_bytes().get(tld.len()) == Some(&b'.')
&& h.ends_with(tld))
}
async fn resolve_doh(
dns_bytes: &[u8],
src: SocketAddr,
ctx: &std::sync::Arc<ServerCtx>,
) -> Response {
let mut buffer = BytePacketBuffer::from_bytes(dns_bytes);
let query = match DnsPacket::from_buffer(&mut buffer) {
Ok(q) => q,
Err(e) => {
warn!("DoH: parse error from {}: {}", src, e);
let query_id = u16::from_be_bytes([
dns_bytes.first().copied().unwrap_or(0),
dns_bytes.get(1).copied().unwrap_or(0),
]);
let mut resp = DnsPacket::new();
resp.header.id = query_id;
resp.header.response = true;
resp.header.rescode = ResultCode::FORMERR;
return serialize_response(&resp);
}
};
let query_id = query.header.id;
let query_rd = query.header.recursion_desired;
let questions = query.questions.clone();
match resolve_query(query, dns_bytes, src, ctx, Transport::Doh).await {
Ok((resp_buffer, _)) => {
let min_ttl = extract_min_ttl(resp_buffer.filled());
dns_response(resp_buffer.filled(), min_ttl)
}
Err(e) => {
warn!("DoH: resolve error for {}: {}", src, e);
let mut resp = DnsPacket::new();
resp.header.id = query_id;
resp.header.response = true;
resp.header.recursion_desired = query_rd;
resp.header.recursion_available = true;
resp.header.rescode = ResultCode::SERVFAIL;
resp.questions = questions;
serialize_response(&resp)
}
}
}
fn extract_min_ttl(wire: &[u8]) -> u32 {
crate::wire::scan_ttl_offsets(wire)
.ok()
.and_then(|meta| crate::wire::min_ttl_from_wire(wire, &meta))
.unwrap_or(0)
}
fn dns_response(wire: &[u8], min_ttl: u32) -> Response {
(
StatusCode::OK,
[
(hyper::header::CONTENT_TYPE, DOH_CONTENT_TYPE),
(
hyper::header::CACHE_CONTROL,
&format!("max-age={}", min_ttl),
),
],
Bytes::copy_from_slice(wire),
)
.into_response()
}
fn serialize_response(pkt: &DnsPacket) -> Response {
let mut buf = BytePacketBuffer::new();
match pkt.write(&mut buf) {
Ok(_) => dns_response(buf.filled(), 0),
Err(_) => StatusCode::INTERNAL_SERVER_ERROR.into_response(),
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::buffer::BytePacketBuffer;
use crate::header::ResultCode;
use crate::packet::DnsPacket;
use crate::record::DnsRecord;
#[test]
fn is_doh_host_matches_tld() {
assert!(is_doh_host(Some("numa"), "numa"));
assert!(is_doh_host(Some("numa.numa"), "numa"));
assert!(is_doh_host(Some("127.0.0.1"), "numa"));
assert!(is_doh_host(Some("127.0.0.1:443"), "numa"));
assert!(is_doh_host(Some("::1"), "numa"));
assert!(is_doh_host(Some("[::1]"), "numa"));
assert!(is_doh_host(Some("[::1]:443"), "numa"));
assert!(is_doh_host(Some("localhost"), "numa"));
assert!(is_doh_host(Some("localhost:443"), "numa"));
assert!(!is_doh_host(Some("foo.numa"), "numa"));
assert!(!is_doh_host(None, "numa"));
}
#[test]
fn extract_min_ttl_from_response() {
let mut pkt = DnsPacket::new();
pkt.header.response = true;
pkt.answers.push(DnsRecord::A {
domain: "example.com".to_string(),
addr: std::net::Ipv4Addr::new(1, 2, 3, 4),
ttl: 300,
});
pkt.answers.push(DnsRecord::A {
domain: "example.com".to_string(),
addr: std::net::Ipv4Addr::new(5, 6, 7, 8),
ttl: 60,
});
let mut buf = BytePacketBuffer::new();
pkt.write(&mut buf).unwrap();
assert_eq!(extract_min_ttl(buf.filled()), 60);
}
#[test]
fn extract_min_ttl_no_answers() {
let mut pkt = DnsPacket::new();
pkt.header.response = true;
let mut buf = BytePacketBuffer::new();
pkt.write(&mut buf).unwrap();
assert_eq!(extract_min_ttl(buf.filled()), 0);
}
#[test]
fn serialize_formerr_response() {
let mut pkt = DnsPacket::new();
pkt.header.id = 0xABCD;
pkt.header.response = true;
pkt.header.rescode = ResultCode::FORMERR;
let resp = serialize_response(&pkt);
assert_eq!(resp.status(), StatusCode::OK);
}
}

View File

@@ -15,6 +15,7 @@ use crate::config::DotConfig;
use crate::ctx::{resolve_query, ServerCtx};
use crate::header::ResultCode;
use crate::packet::DnsPacket;
use crate::stats::Transport;
const MAX_CONNECTIONS: usize = 512;
const IDLE_TIMEOUT: Duration = Duration::from_secs(30);
@@ -153,8 +154,11 @@ async fn accept_loop(listener: TcpListener, acceptor: TlsAcceptor, ctx: Arc<Serv
/// Handle a single persistent DoT connection (RFC 7858).
/// Reads length-prefixed DNS queries until EOF, idle timeout, or error.
async fn handle_dot_connection<S>(mut stream: S, remote_addr: SocketAddr, ctx: &ServerCtx)
where
async fn handle_dot_connection<S>(
mut stream: S,
remote_addr: SocketAddr,
ctx: &std::sync::Arc<ServerCtx>,
) where
S: AsyncReadExt + AsyncWriteExt + Unpin,
{
loop {
@@ -177,8 +181,6 @@ where
break;
};
// Parse query up-front so we can echo its question section in SERVFAIL
// responses when resolve_query fails.
let query = match DnsPacket::from_buffer(&mut buffer) {
Ok(q) => q,
Err(e) => {
@@ -200,8 +202,16 @@ where
}
};
match resolve_query(query.clone(), remote_addr, ctx).await {
Ok(resp_buffer) => {
match resolve_query(
query.clone(),
&buffer.buf[..msg_len],
remote_addr,
ctx,
Transport::Dot,
)
.await
{
Ok((resp_buffer, _)) => {
if write_framed(&mut stream, resp_buffer.filled())
.await
.is_err()
@@ -269,7 +279,7 @@ where
mod tests {
use super::*;
use std::collections::HashMap;
use std::sync::{Mutex, RwLock};
use std::sync::Mutex;
use rcgen::{CertificateParams, DnType, KeyPair};
use rustls::pki_types::{CertificateDer, PrivateKeyDer, PrivatePkcs8KeyDer, ServerName};
@@ -334,54 +344,29 @@ mod tests {
async fn spawn_dot_server() -> (SocketAddr, CertificateDer<'static>) {
let (server_tls, cert_der) = test_tls_configs();
let socket = tokio::net::UdpSocket::bind("127.0.0.1:0").await.unwrap();
// Bind an unresponsive upstream and leak it so it lives for the test duration.
let blackhole = Box::leak(Box::new(std::net::UdpSocket::bind("127.0.0.1:0").unwrap()));
let upstream_addr = blackhole.local_addr().unwrap();
let ctx = Arc::new(ServerCtx {
socket,
zone_map: {
let mut m = HashMap::new();
let mut inner = HashMap::new();
inner.insert(
QueryType::A,
vec![DnsRecord::A {
domain: "dot-test.example".to_string(),
addr: std::net::Ipv4Addr::new(10, 0, 0, 1),
ttl: 300,
}],
);
m.insert("dot-test.example".to_string(), inner);
m
},
cache: RwLock::new(crate::cache::DnsCache::new(100, 60, 86400)),
stats: Mutex::new(crate::stats::ServerStats::new()),
overrides: RwLock::new(crate::override_store::OverrideStore::new()),
blocklist: RwLock::new(crate::blocklist::BlocklistStore::new()),
query_log: Mutex::new(crate::query_log::QueryLog::new(100)),
services: Mutex::new(crate::service_store::ServiceStore::new()),
lan_peers: Mutex::new(crate::lan::PeerStore::new(90)),
forwarding_rules: Vec::new(),
upstream: Mutex::new(crate::forward::Upstream::Udp(upstream_addr)),
upstream_auto: false,
upstream_port: 53,
lan_ip: Mutex::new(std::net::Ipv4Addr::LOCALHOST),
timeout: Duration::from_millis(200),
proxy_tld: "numa".to_string(),
proxy_tld_suffix: ".numa".to_string(),
lan_enabled: false,
config_path: String::new(),
config_found: false,
config_dir: std::path::PathBuf::from("/tmp"),
data_dir: std::path::PathBuf::from("/tmp"),
tls_config: Some(arc_swap::ArcSwap::from(server_tls)),
upstream_mode: crate::config::UpstreamMode::Forward,
root_hints: Vec::new(),
srtt: RwLock::new(crate::srtt::SrttCache::new(true)),
inflight: Mutex::new(HashMap::new()),
dnssec_enabled: false,
dnssec_strict: false,
});
let upstream_addr = crate::testutil::blackhole_upstream();
let mut ctx = crate::testutil::test_ctx().await;
ctx.zone_map = {
let mut m = HashMap::new();
let mut inner = HashMap::new();
inner.insert(
QueryType::A,
vec![DnsRecord::A {
domain: "dot-test.example".to_string(),
addr: std::net::Ipv4Addr::new(10, 0, 0, 1),
ttl: 300,
}],
);
m.insert("dot-test.example".to_string(), inner);
m
};
ctx.upstream_pool = Mutex::new(crate::forward::UpstreamPool::new(
vec![crate::forward::Upstream::Udp(upstream_addr)],
vec![],
));
ctx.tls_config = Some(arc_swap::ArcSwap::from(server_tls));
let ctx = Arc::new(ctx);
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();

View File

@@ -1,12 +1,16 @@
use std::fmt;
use std::net::SocketAddr;
use std::time::Duration;
use std::net::{IpAddr, SocketAddr};
use std::sync::{Arc, RwLock};
use std::time::{Duration, Instant};
use tokio::net::UdpSocket;
use tokio::time::timeout;
use crate::buffer::BytePacketBuffer;
use crate::odoh::{query_through_relay, OdohConfigCache};
use crate::packet::DnsPacket;
use crate::srtt::SrttCache;
use crate::stats::UpstreamTransport;
use crate::Result;
#[derive(Clone)]
@@ -16,6 +20,41 @@ pub enum Upstream {
url: String,
client: reqwest::Client,
},
Dot {
addr: SocketAddr,
tls_name: Option<String>,
connector: tokio_rustls::TlsConnector,
},
/// Oblivious DNS-over-HTTPS (RFC 9230). Queries are HPKE-sealed to the
/// target and forwarded through an independent relay. Target host lives
/// on `target_config` (single source of truth — the cache keys on it).
Odoh {
relay_url: String,
target_path: String,
client: reqwest::Client,
target_config: Arc<OdohConfigCache>,
},
}
impl Upstream {
/// IP address to key SRTT tracking on, if the upstream has a stable one.
/// `Doh` and `Odoh` route through a URL + connection pool, so there's no
/// single IP to track; SRTT is skipped for them.
pub fn tracked_ip(&self) -> Option<IpAddr> {
match self {
Upstream::Udp(addr) | Upstream::Dot { addr, .. } => Some(addr.ip()),
Upstream::Doh { .. } | Upstream::Odoh { .. } => None,
}
}
pub fn transport(&self) -> UpstreamTransport {
match self {
Upstream::Udp(_) => UpstreamTransport::Udp,
Upstream::Doh { .. } => UpstreamTransport::Doh,
Upstream::Dot { .. } => UpstreamTransport::Dot,
Upstream::Odoh { .. } => UpstreamTransport::Odoh,
}
}
}
impl PartialEq for Upstream {
@@ -23,16 +62,206 @@ impl PartialEq for Upstream {
match (self, other) {
(Self::Udp(a), Self::Udp(b)) => a == b,
(Self::Doh { url: a, .. }, Self::Doh { url: b, .. }) => a == b,
(Self::Dot { addr: a, .. }, Self::Dot { addr: b, .. }) => a == b,
(
Self::Odoh {
relay_url: ra,
target_path: pa,
target_config: ca,
..
},
Self::Odoh {
relay_url: rb,
target_path: pb,
target_config: cb,
..
},
) => ra == rb && pa == pb && ca.target_host() == cb.target_host(),
_ => false,
}
}
}
impl fmt::Debug for Upstream {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Display::fmt(self, f)
}
}
impl fmt::Display for Upstream {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Upstream::Udp(addr) => write!(f, "{}", addr),
Upstream::Doh { url, .. } => f.write_str(url),
Upstream::Dot { addr, tls_name, .. } => match tls_name {
Some(name) => write!(f, "tls://{}#{}", addr, name),
None => write!(f, "tls://{}", addr),
},
Upstream::Odoh {
relay_url,
target_path,
target_config,
..
} => write!(
f,
"odoh://{}{} via {}",
target_config.target_host(),
target_path,
relay_url
),
}
}
}
pub(crate) fn parse_upstream_addr(
s: &str,
default_port: u16,
) -> std::result::Result<SocketAddr, String> {
// Try full socket addr first: "1.2.3.4:5353" or "[::1]:5353"
if let Ok(addr) = s.parse::<SocketAddr>() {
return Ok(addr);
}
// Bare IP: "1.2.3.4" or "::1"
if let Ok(ip) = s.parse::<IpAddr>() {
return Ok(SocketAddr::new(ip, default_port));
}
Err(format!("invalid upstream address: {}", s))
}
/// Parse a slice of upstream address strings into `Upstream` values, failing
/// on the first invalid entry.
pub fn parse_upstream_list(addrs: &[String], default_port: u16) -> Result<Vec<Upstream>> {
addrs
.iter()
.map(|s| parse_upstream(s, default_port))
.collect()
}
pub fn parse_upstream(s: &str, default_port: u16) -> Result<Upstream> {
if s.starts_with("https://") {
return Ok(Upstream::Doh {
url: s.to_string(),
client: build_https_client(),
});
}
// tls://IP:PORT#hostname or tls://IP#hostname (default port 853)
if let Some(rest) = s.strip_prefix("tls://") {
let (addr_part, tls_name) = match rest.find('#') {
Some(i) => (&rest[..i], Some(rest[i + 1..].to_string())),
None => (rest, None),
};
let addr = parse_upstream_addr(addr_part, 853)?;
let connector = build_dot_connector()?;
return Ok(Upstream::Dot {
addr,
tls_name,
connector,
});
}
let addr = parse_upstream_addr(s, default_port)?;
Ok(Upstream::Udp(addr))
}
/// HTTP/2 client tuned for DoH/ODoH: small windows for low latency, long-lived
/// keep-alive. Shared by the DoH upstream and the ODoH config-fetcher +
/// seal/open path. Pool defaults to one idle conn per host — good for
/// resolvers that talk to a single upstream; relays that fan out to many
/// targets should use [`build_https_client_with_pool`].
pub fn build_https_client() -> reqwest::Client {
build_https_client_with_pool(1)
}
/// Same shape as [`build_https_client`], but caller picks
/// `pool_max_idle_per_host`. Relay workloads hit many distinct target hosts
/// and benefit from a larger pool so warm connections survive concurrent
/// fan-out.
pub fn build_https_client_with_pool(pool_max_idle_per_host: usize) -> reqwest::Client {
https_client_builder(pool_max_idle_per_host)
.build()
.unwrap_or_default()
}
/// HTTPS client for the ODoH upstream, with bootstrap-IP overrides applied
/// so relay/target hostname resolution can bypass system DNS.
pub fn build_odoh_client(odoh: &crate::config::OdohUpstream) -> reqwest::Client {
let mut builder = https_client_builder(1);
if let Some(addr) = odoh.relay_bootstrap {
builder = builder.resolve(&odoh.relay_host, addr);
}
if let Some(addr) = odoh.target_bootstrap {
builder = builder.resolve(&odoh.target_host, addr);
}
builder.build().unwrap_or_default()
}
fn https_client_builder(pool_max_idle_per_host: usize) -> reqwest::ClientBuilder {
reqwest::Client::builder()
.use_rustls_tls()
.http2_initial_stream_window_size(65_535)
.http2_initial_connection_window_size(65_535)
.http2_keep_alive_interval(Duration::from_secs(15))
.http2_keep_alive_while_idle(true)
.http2_keep_alive_timeout(Duration::from_secs(10))
.pool_idle_timeout(Duration::from_secs(300))
.pool_max_idle_per_host(pool_max_idle_per_host)
}
fn build_dot_connector() -> Result<tokio_rustls::TlsConnector> {
let _ = rustls::crypto::ring::default_provider().install_default();
let mut root_store = rustls::RootCertStore::empty();
root_store.extend(webpki_roots::TLS_SERVER_ROOTS.iter().cloned());
let config = rustls::ClientConfig::builder()
.with_root_certificates(root_store)
.with_no_client_auth();
Ok(tokio_rustls::TlsConnector::from(std::sync::Arc::new(
config,
)))
}
#[derive(Clone)]
pub struct UpstreamPool {
primary: Vec<Upstream>,
fallback: Vec<Upstream>,
}
impl UpstreamPool {
pub fn new(primary: Vec<Upstream>, fallback: Vec<Upstream>) -> Self {
Self { primary, fallback }
}
pub fn preferred(&self) -> Option<&Upstream> {
self.primary.first().or(self.fallback.first())
}
pub fn set_primary(&mut self, primary: Vec<Upstream>) {
self.primary = primary;
}
/// Update the primary upstream if `new_addr` (parsed with `port`) differs
/// from the current preferred upstream. Returns `true` if the pool changed.
pub fn maybe_update_primary(&mut self, new_addr: &str, port: u16) -> bool {
let Ok(new_sock) = format!("{}:{}", new_addr, port).parse::<SocketAddr>() else {
return false;
};
let new_upstream = Upstream::Udp(new_sock);
if self.preferred() == Some(&new_upstream) {
return false;
}
self.primary = vec![new_upstream];
true
}
pub fn label(&self) -> String {
match self.preferred() {
Some(u) => {
let total = self.primary.len() + self.fallback.len();
if total > 1 {
format!("{} (+{} more)", u, total - 1)
} else {
u.to_string()
}
}
None => "none".to_string(),
}
}
}
@@ -42,10 +271,11 @@ pub async fn forward_query(
upstream: &Upstream,
timeout_duration: Duration,
) -> Result<DnsPacket> {
match upstream {
Upstream::Udp(addr) => forward_udp(query, *addr, timeout_duration).await,
Upstream::Doh { url, client } => forward_doh(query, url, client, timeout_duration).await,
}
let mut send_buffer = BytePacketBuffer::new();
query.write(&mut send_buffer)?;
let data = forward_query_raw(send_buffer.filled(), upstream, timeout_duration).await?;
let mut recv_buffer = BytePacketBuffer::from_bytes(&data);
DnsPacket::from_buffer(&mut recv_buffer)
}
pub(crate) async fn forward_udp(
@@ -53,24 +283,10 @@ pub(crate) async fn forward_udp(
upstream: SocketAddr,
timeout_duration: Duration,
) -> Result<DnsPacket> {
let socket = UdpSocket::bind("0.0.0.0:0").await?;
let mut send_buffer = BytePacketBuffer::new();
query.write(&mut send_buffer)?;
socket.send_to(send_buffer.filled(), upstream).await?;
let mut recv_buffer = BytePacketBuffer::new();
let (size, _) = timeout(timeout_duration, socket.recv_from(&mut recv_buffer.buf)).await??;
if size == recv_buffer.buf.len() {
log::debug!(
"upstream response truncated ({} bytes, buffer {})",
size,
recv_buffer.buf.len()
);
}
let data = forward_udp_raw(send_buffer.filled(), upstream, timeout_duration).await?;
let mut recv_buffer = BytePacketBuffer::from_bytes(&data);
DnsPacket::from_buffer(&mut recv_buffer)
}
@@ -107,22 +323,224 @@ pub(crate) async fn forward_tcp(
DnsPacket::from_buffer(&mut recv_buffer)
}
async fn forward_doh(
query: &DnsPacket,
async fn forward_dot_raw(
wire: &[u8],
addr: SocketAddr,
tls_name: &Option<String>,
connector: &tokio_rustls::TlsConnector,
timeout_duration: Duration,
) -> Result<Vec<u8>> {
use rustls::pki_types::ServerName;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpStream;
let server_name = match tls_name {
Some(name) => ServerName::try_from(name.clone())?,
None => ServerName::try_from(addr.ip().to_string())?,
};
let tcp = timeout(timeout_duration, TcpStream::connect(addr)).await??;
let mut tls = timeout(timeout_duration, connector.connect(server_name, tcp)).await??;
let mut outbuf = Vec::with_capacity(2 + wire.len());
outbuf.extend_from_slice(&(wire.len() as u16).to_be_bytes());
outbuf.extend_from_slice(wire);
timeout(timeout_duration, tls.write_all(&outbuf)).await??;
let mut len_buf = [0u8; 2];
timeout(timeout_duration, tls.read_exact(&mut len_buf)).await??;
let resp_len = u16::from_be_bytes(len_buf) as usize;
let mut data = vec![0u8; resp_len];
timeout(timeout_duration, tls.read_exact(&mut data)).await??;
Ok(data)
}
pub async fn forward_query_raw(
wire: &[u8],
upstream: &Upstream,
timeout_duration: Duration,
) -> Result<Vec<u8>> {
match upstream {
Upstream::Udp(addr) => forward_udp_raw(wire, *addr, timeout_duration).await,
Upstream::Doh { url, client } => forward_doh_raw(wire, url, client, timeout_duration).await,
Upstream::Dot {
addr,
tls_name,
connector,
} => forward_dot_raw(wire, *addr, tls_name, connector, timeout_duration).await,
Upstream::Odoh {
relay_url,
target_path,
client,
target_config,
} => {
query_through_relay(
wire,
relay_url,
target_path,
client,
target_config,
timeout_duration,
)
.await
}
}
}
pub async fn forward_with_hedging_raw(
wire: &[u8],
primary: &Upstream,
secondary: &Upstream,
hedge_delay: Duration,
timeout_duration: Duration,
) -> Result<Vec<u8>> {
use tokio::time::sleep;
let primary_fut = forward_query_raw(wire, primary, timeout_duration);
tokio::pin!(primary_fut);
let delay = sleep(hedge_delay);
tokio::pin!(delay);
// Phase 1: wait for either primary to return, or the hedge delay.
tokio::select! {
result = &mut primary_fut => return result,
_ = &mut delay => {}
}
// Phase 2: hedge delay expired — fire secondary while still polling primary.
let secondary_fut = forward_query_raw(wire, secondary, timeout_duration);
tokio::pin!(secondary_fut);
// First successful response wins. If one errors, wait for the other.
let mut primary_err: Option<crate::Error> = None;
let mut secondary_err: Option<crate::Error> = None;
loop {
tokio::select! {
r = &mut primary_fut, if primary_err.is_none() => {
match r {
Ok(resp) => return Ok(resp),
Err(e) => {
if let Some(se) = secondary_err.take() {
return Err(se);
}
primary_err = Some(e);
}
}
}
r = &mut secondary_fut, if secondary_err.is_none() => {
match r {
Ok(resp) => return Ok(resp),
Err(e) => {
if let Some(pe) = primary_err.take() {
return Err(pe);
}
secondary_err = Some(e);
}
}
}
}
match (primary_err, secondary_err) {
(Some(pe), Some(_)) => return Err(pe),
(pe, se) => {
primary_err = pe;
secondary_err = se;
}
}
}
}
pub async fn forward_with_failover_raw(
wire: &[u8],
pool: &UpstreamPool,
srtt: &RwLock<SrttCache>,
timeout_duration: Duration,
hedge_delay: Duration,
) -> Result<Vec<u8>> {
let mut candidates: Vec<(usize, u64)> = {
let srtt_read = srtt.read().unwrap();
pool.primary
.iter()
.enumerate()
.map(|(i, u)| {
let rtt = u.tracked_ip().map(|ip| srtt_read.get(ip)).unwrap_or(0);
(i, rtt)
})
.collect()
};
candidates.sort_by_key(|&(_, rtt)| rtt);
let all_upstreams: Vec<&Upstream> = candidates
.iter()
.map(|&(i, _)| &pool.primary[i])
.chain(pool.fallback.iter())
.collect();
let mut last_err: Option<Box<dyn std::error::Error + Send + Sync>> = None;
for upstream in &all_upstreams {
let start = Instant::now();
let result = if !hedge_delay.is_zero() {
// Hedge against the same upstream: independent h2 streams (DoH),
// independent UDP packets (plain DNS), or independent TLS
// connections (DoT). Rescues packet loss, dispatch spikes, and
// TLS handshake stalls.
forward_with_hedging_raw(wire, upstream, upstream, hedge_delay, timeout_duration).await
} else {
forward_query_raw(wire, upstream, timeout_duration).await
};
match result {
Ok(resp) => {
if let Some(ip) = upstream.tracked_ip() {
let rtt_ms = start.elapsed().as_millis() as u64;
srtt.write().unwrap().record_rtt(ip, rtt_ms, false);
}
return Ok(resp);
}
Err(e) => {
if let Some(ip) = upstream.tracked_ip() {
srtt.write().unwrap().record_failure(ip);
}
log::debug!("upstream {} failed: {}", upstream, e);
last_err = Some(e);
}
}
}
Err(last_err.unwrap_or_else(|| "no upstream configured".into()))
}
async fn forward_udp_raw(
wire: &[u8],
upstream: SocketAddr,
timeout_duration: Duration,
) -> Result<Vec<u8>> {
let socket = UdpSocket::bind("0.0.0.0:0").await?;
socket.send_to(wire, upstream).await?;
let mut recv_buf = vec![0u8; 4096];
let (size, _) = timeout(timeout_duration, socket.recv_from(&mut recv_buf)).await??;
recv_buf.truncate(size);
Ok(recv_buf)
}
async fn forward_doh_raw(
wire: &[u8],
url: &str,
client: &reqwest::Client,
timeout_duration: Duration,
) -> Result<DnsPacket> {
let mut send_buffer = BytePacketBuffer::new();
query.write(&mut send_buffer)?;
) -> Result<Vec<u8>> {
let resp = timeout(
timeout_duration,
client
.post(url)
.header("content-type", "application/dns-message")
.header("accept", "application/dns-message")
.body(send_buffer.filled().to_vec())
.body(wire.to_vec())
.send(),
)
.await??
@@ -130,9 +548,25 @@ async fn forward_doh(
let bytes = resp.bytes().await?;
log::debug!("DoH response: {} bytes", bytes.len());
Ok(bytes.to_vec())
}
let mut recv_buffer = BytePacketBuffer::from_bytes(&bytes);
DnsPacket::from_buffer(&mut recv_buffer)
/// Send a lightweight keepalive query to a DoH upstream to prevent
/// the HTTP/2 + TLS connection from going idle and being torn down.
pub async fn keepalive_doh(upstream: &Upstream) {
if let Upstream::Doh { url, client } = upstream {
// Query for . NS — minimal, always succeeds, response is small
let wire: &[u8] = &[
0x00, 0x00, // ID
0x01, 0x00, // flags: RD=1
0x00, 0x01, // QDCOUNT=1
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // AN=0, NS=0, AR=0
0x00, // root name (.)
0x00, 0x02, // type NS
0x00, 0x01, // class IN
];
let _ = forward_doh_raw(wire, url, client, Duration::from_secs(5)).await;
}
}
#[cfg(test)]
@@ -271,4 +705,179 @@ mod tests {
let result = forward_query(&make_query(), &upstream, Duration::from_millis(100)).await;
assert!(result.is_err());
}
#[test]
fn parse_addr_ip_only() {
let addr = parse_upstream_addr("1.2.3.4", 53).unwrap();
assert_eq!(addr, "1.2.3.4:53".parse::<SocketAddr>().unwrap());
}
#[test]
fn parse_addr_ip_port() {
let addr = parse_upstream_addr("1.2.3.4:5353", 53).unwrap();
assert_eq!(addr, "1.2.3.4:5353".parse::<SocketAddr>().unwrap());
}
#[test]
fn parse_addr_ipv6_bracketed() {
let addr = parse_upstream_addr("[::1]:5553", 53).unwrap();
assert_eq!(addr, "[::1]:5553".parse::<SocketAddr>().unwrap());
}
#[test]
fn parse_addr_ipv6_bare() {
let addr = parse_upstream_addr("::1", 53).unwrap();
assert_eq!(addr, "[::1]:53".parse::<SocketAddr>().unwrap());
}
#[test]
fn pool_label_single() {
let pool = UpstreamPool::new(vec![Upstream::Udp("1.2.3.4:53".parse().unwrap())], vec![]);
assert_eq!(pool.label(), "1.2.3.4:53");
}
#[test]
fn pool_label_multi() {
let pool = UpstreamPool::new(
vec![Upstream::Udp("1.2.3.4:53".parse().unwrap())],
vec![Upstream::Udp("8.8.8.8:53".parse().unwrap())],
);
assert_eq!(pool.label(), "1.2.3.4:53 (+1 more)");
}
#[tokio::test]
async fn failover_tries_next_on_failure() {
// First upstream is unreachable, second responds
let query = make_query();
let response_bytes = to_wire(&make_response(&query));
let app = axum::Router::new().route(
"/dns-query",
axum::routing::post(move || {
let body = response_bytes.clone();
async move {
(
[(axum::http::header::CONTENT_TYPE, "application/dns-message")],
body,
)
}
}),
);
let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap();
let good_addr = listener.local_addr().unwrap();
tokio::spawn(axum::serve(listener, app).into_future());
// Unreachable UDP upstream + working DoH upstream
let pool = UpstreamPool::new(
vec![
Upstream::Udp("127.0.0.1:1".parse().unwrap()), // will fail
Upstream::Doh {
url: format!("http://{}/dns-query", good_addr),
client: reqwest::Client::new(),
},
],
vec![],
);
let srtt = RwLock::new(SrttCache::new(true));
let wire = to_wire(&query);
let resp_wire = forward_with_failover_raw(
&wire,
&pool,
&srtt,
Duration::from_millis(500),
Duration::ZERO,
)
.await
.expect("should fail over to second upstream");
let mut buf = BytePacketBuffer::from_bytes(&resp_wire);
let result = DnsPacket::from_buffer(&mut buf).unwrap();
assert_eq!(result.header.id, 0xABCD);
assert_eq!(result.answers.len(), 1);
}
#[test]
fn maybe_update_primary_swaps_when_different() {
let mut pool = UpstreamPool::new(
vec![Upstream::Udp("1.2.3.4:53".parse().unwrap())],
vec![Upstream::Udp("8.8.8.8:53".parse().unwrap())],
);
assert!(pool.maybe_update_primary("5.6.7.8", 53));
assert_eq!(pool.preferred().unwrap().to_string(), "5.6.7.8:53");
}
#[test]
fn maybe_update_primary_noop_when_same() {
let mut pool =
UpstreamPool::new(vec![Upstream::Udp("1.2.3.4:53".parse().unwrap())], vec![]);
assert!(!pool.maybe_update_primary("1.2.3.4", 53));
}
#[test]
fn maybe_update_primary_rejects_invalid_addr() {
let mut pool =
UpstreamPool::new(vec![Upstream::Udp("1.2.3.4:53".parse().unwrap())], vec![]);
assert!(!pool.maybe_update_primary("not-an-ip", 53));
assert_eq!(pool.preferred().unwrap().to_string(), "1.2.3.4:53");
}
fn tcp_closed_port() -> SocketAddr {
// Bind a TCP listener, grab the port, drop → kernel returns RST on connect.
let listener = std::net::TcpListener::bind("127.0.0.1:0").unwrap();
let addr = listener.local_addr().unwrap();
drop(listener);
addr
}
#[tokio::test]
async fn udp_failure_records_in_srtt() {
let blackhole = crate::testutil::blackhole_upstream();
let pool = UpstreamPool::new(vec![Upstream::Udp(blackhole)], vec![]);
let srtt = RwLock::new(SrttCache::new(true));
let _ = forward_with_failover_raw(
&[0u8; 12],
&pool,
&srtt,
Duration::from_millis(100),
Duration::ZERO,
)
.await;
assert!(srtt.read().unwrap().is_known(blackhole.ip()));
}
#[tokio::test]
async fn dot_failure_records_in_srtt() {
let dead1 = tcp_closed_port();
let dead2 = tcp_closed_port();
let connector = build_dot_connector().unwrap();
let pool = UpstreamPool::new(
vec![
Upstream::Dot {
addr: dead1,
tls_name: Some("dns.quad9.net".to_string()),
connector: connector.clone(),
},
Upstream::Dot {
addr: dead2,
tls_name: Some("dns.quad9.net".to_string()),
connector,
},
],
vec![],
);
let srtt = RwLock::new(SrttCache::new(true));
let _ = forward_with_failover_raw(
&[0u8; 12],
&pool,
&srtt,
Duration::from_millis(500),
Duration::ZERO,
)
.await;
let cache = srtt.read().unwrap();
assert!(cache.is_known(dead1.ip()));
assert!(cache.is_known(dead2.ip()));
}
}

258
src/health.rs Normal file
View File

@@ -0,0 +1,258 @@
//! Health metadata and `/health` response shape, shared between the main
//! HTTP API and the mobile API.
//!
//! The static fields (version, hostname, DoT config, CA fingerprint,
//! feature list) are computed once at startup and stored in [`HealthMeta`]
//! on `ServerCtx`. Per-request fields (uptime, LAN IP) are computed live.
//! Both handlers call [`HealthResponse::build`] to assemble the JSON
//! response from `HealthMeta` + live inputs.
//!
//! JSON schema is documented in `docs/implementation/ios-companion-app.md`
//! §4.2. The iOS companion app's `HealthInfo` struct is the canonical
//! consumer; any change to this response must keep that struct decoding
//! cleanly (all consumed fields are optional on the Swift side, but
//! `lan_ip` is load-bearing for the pipeline).
use std::net::Ipv4Addr;
use std::path::Path;
use std::time::Instant;
use ring::digest::{digest, SHA256};
use serde::Serialize;
/// Immutable health metadata cached on `ServerCtx`. Built once at startup
/// from config + file-system state (CA cert).
#[derive(Clone)]
pub struct HealthMeta {
pub version: &'static str,
pub hostname: String,
pub sni: String,
pub dot_enabled: bool,
pub dot_port: u16,
pub api_port: u16,
pub ca_fingerprint_sha256: Option<String>,
pub features: Vec<String>,
pub started_at: Instant,
}
impl HealthMeta {
/// Minimal `HealthMeta` for unit tests that construct a `ServerCtx`
/// without needing the real startup flow (CA file reads, hostname
/// detection, etc.). Deterministic values so test JSON assertions
/// stay stable.
#[cfg(test)]
pub fn test_fixture() -> Self {
HealthMeta {
version: crate::version(),
hostname: "test-host".to_string(),
sni: "numa.numa".to_string(),
dot_enabled: false,
dot_port: 853,
api_port: 8765,
ca_fingerprint_sha256: None,
features: vec![],
started_at: Instant::now(),
}
}
/// Build a new HealthMeta from config + startup-time environment.
/// Call once at server boot; the returned value is cheap to clone
/// (small number of short strings) and lives on `ServerCtx`.
///
/// The argument count is deliberate — each flag corresponds to a
/// specific config value and is clearly named at the call site.
/// Collapsing into a struct hides nothing meaningful for a one-call
/// initializer.
#[allow(clippy::too_many_arguments)]
pub fn build(
data_dir: &Path,
dot_enabled: bool,
dot_port: u16,
api_port: u16,
dnssec_enabled: bool,
recursive_enabled: bool,
mdns_enabled: bool,
blocking_enabled: bool,
doh_enabled: bool,
) -> Self {
let ca_path = data_dir.join("ca.pem");
let ca_fingerprint_sha256 = compute_ca_fingerprint(&ca_path);
let mut features = Vec::new();
if doh_enabled {
features.push("doh".to_string());
}
if dot_enabled {
features.push("dot".to_string());
}
if recursive_enabled {
features.push("recursive".to_string());
}
if blocking_enabled {
features.push("blocking".to_string());
}
if mdns_enabled {
features.push("mdns".to_string());
}
if dnssec_enabled {
features.push("dnssec".to_string());
}
HealthMeta {
version: crate::version(),
hostname: crate::hostname(),
sni: "numa.numa".to_string(),
dot_enabled,
dot_port,
api_port,
ca_fingerprint_sha256,
features,
started_at: Instant::now(),
}
}
}
/// JSON response shape returned by `GET /health` on both main and mobile APIs.
///
/// Fields are organized to match the iOS companion app's
/// `HealthInfo` Swift struct — see `ios-companion-app.md` §4.2.
#[derive(Serialize)]
pub struct HealthResponse {
pub status: &'static str,
pub version: &'static str,
pub uptime_secs: u64,
pub hostname: String,
pub lan_ip: Option<String>,
pub sni: String,
pub dot: DotBlock,
pub api: ApiBlock,
pub ca: CaBlock,
pub features: Vec<String>,
}
#[derive(Serialize)]
pub struct DotBlock {
pub enabled: bool,
pub port: Option<u16>,
}
#[derive(Serialize)]
pub struct ApiBlock {
pub port: u16,
}
#[derive(Serialize)]
pub struct CaBlock {
pub present: bool,
pub fingerprint_sha256: Option<String>,
}
impl HealthResponse {
/// Assemble a fresh `HealthResponse` from the cached metadata and
/// the current LAN IP (which may change across network transitions).
/// Pass `None` for `lan_ip` if detection fails — the response still
/// returns 200 OK, just without the LAN address.
pub fn build(meta: &HealthMeta, lan_ip: Option<Ipv4Addr>) -> Self {
HealthResponse {
status: "ok",
version: meta.version,
uptime_secs: meta.started_at.elapsed().as_secs(),
hostname: meta.hostname.clone(),
lan_ip: lan_ip.map(|ip| ip.to_string()),
sni: meta.sni.clone(),
dot: DotBlock {
enabled: meta.dot_enabled,
port: if meta.dot_enabled {
Some(meta.dot_port)
} else {
None
},
},
api: ApiBlock {
port: meta.api_port,
},
ca: CaBlock {
present: meta.ca_fingerprint_sha256.is_some(),
fingerprint_sha256: meta.ca_fingerprint_sha256.clone(),
},
features: meta.features.clone(),
}
}
}
/// Read the CA cert at `ca_path` and return its SHA-256 fingerprint as a
/// lowercase hex string, or None if the file doesn't exist or can't be read.
///
/// Hashes the raw PEM bytes for simplicity. A more canonical SPKI-based
/// fingerprint would require parsing the PEM → DER → extracting
/// SubjectPublicKeyInfo, which adds complexity without meaningful benefit
/// for our use case (the iOS app uses the fingerprint only for display
/// and to detect rotation).
fn compute_ca_fingerprint(ca_path: &Path) -> Option<String> {
let pem = std::fs::read(ca_path).ok()?;
let hash = digest(&SHA256, &pem);
let hex: String = hash.as_ref().iter().map(|b| format!("{:02x}", b)).collect();
Some(hex)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn health_response_contains_required_fields() {
let meta = HealthMeta {
version: "0.10.0",
hostname: "test-host".to_string(),
sni: "numa.numa".to_string(),
dot_enabled: true,
dot_port: 853,
api_port: 8765,
ca_fingerprint_sha256: Some("abcd1234".to_string()),
features: vec!["dot".to_string(), "dnssec".to_string()],
started_at: Instant::now(),
};
let response = HealthResponse::build(&meta, Some(Ipv4Addr::new(192, 168, 1, 50)));
let json = serde_json::to_string(&response).unwrap();
assert!(json.contains("\"status\":\"ok\""));
assert!(json.contains("\"version\":\"0.10.0\""));
assert!(json.contains("\"hostname\":\"test-host\""));
assert!(json.contains("\"lan_ip\":\"192.168.1.50\""));
assert!(json.contains("\"sni\":\"numa.numa\""));
assert!(json.contains("\"port\":853"));
assert!(json.contains("\"port\":8765"));
assert!(json.contains("\"fingerprint_sha256\":\"abcd1234\""));
assert!(json.contains("\"features\":[\"dot\",\"dnssec\"]"));
}
#[test]
fn health_response_omits_dot_port_when_disabled() {
let meta = HealthMeta {
version: "0.10.0",
hostname: "t".to_string(),
sni: "numa.numa".to_string(),
dot_enabled: false,
dot_port: 853,
api_port: 8765,
ca_fingerprint_sha256: None,
features: vec![],
started_at: Instant::now(),
};
let response = HealthResponse::build(&meta, None);
let json = serde_json::to_string(&response).unwrap();
assert!(json.contains("\"enabled\":false"));
assert!(json.contains("\"dot\":{\"enabled\":false,\"port\":null}"));
assert!(json.contains("\"present\":false"));
assert!(json.contains("\"lan_ip\":null"));
}
#[test]
fn ca_fingerprint_returns_none_for_missing_file() {
let fp = compute_ca_fingerprint(Path::new("/nonexistent/ca.pem"));
assert!(fp.is_none());
}
}

View File

@@ -9,6 +9,7 @@ use crate::buffer::BytePacketBuffer;
use crate::config::LanConfig;
use crate::ctx::ServerCtx;
use crate::header::DnsHeader;
use crate::health::HealthMeta;
use crate::question::{DnsQuestion, QueryType};
// --- Constants ---
@@ -18,6 +19,18 @@ const MDNS_PORT: u16 = 5353;
const SERVICE_TYPE: &str = "_numa._tcp.local";
const MDNS_TTL: u32 = 120;
// TXT record key prefixes (including the trailing `=`). Shared between
// the sender (`build_announcement`) and the receiver (`parse_mdns_response`)
// to prevent drift — both sides match on the same literal, not on two
// independent string constants that could diverge.
const TXT_SERVICES: &str = "services=";
const TXT_ID: &str = "id=";
const TXT_VERSION: &str = "version=";
const TXT_API_PORT: &str = "api_port=";
const TXT_PROTO: &str = "proto=";
const TXT_DOT_PORT: &str = "dot_port=";
const TXT_CA_FP: &str = "ca_fp=";
// --- Peer Store ---
pub struct PeerStore {
@@ -97,14 +110,16 @@ pub fn detect_lan_ip() -> Option<Ipv4Addr> {
}
}
/// Short hostname for mDNS instance names (`<short>._numa._tcp.local`).
/// Truncates at the first `.` so `macbook-pro.local` becomes `macbook-pro`.
/// Uses the shared `crate::hostname()` helper as the source.
fn get_hostname() -> String {
std::process::Command::new("hostname")
.output()
.ok()
.and_then(|o| String::from_utf8(o.stdout).ok())
.map(|h| h.trim().split('.').next().unwrap_or("numa").to_string())
.filter(|h| !h.is_empty())
.unwrap_or_else(|| "numa".to_string())
crate::hostname()
.split('.')
.next()
.filter(|s| !s.is_empty())
.unwrap_or("numa")
.to_string()
}
/// Generate a per-process instance ID for self-filtering on multi-instance hosts
@@ -168,13 +183,22 @@ pub async fn start_lan_discovery(ctx: Arc<ServerCtx>, config: &LanConfig) {
.map(|e| (e.name.clone(), e.target_port))
.collect()
};
if services.is_empty() {
continue;
}
// Note: we always announce ourselves, even when the
// services list is empty. The announcement still carries
// the mobile API port + version + CA fingerprint in TXT,
// which is what the iOS companion app browses for via
// NWBrowser on `_numa._tcp.local`. Other Numa peers
// receive these empty-services announcements too and
// correctly ignore them in parse_mdns_response (the
// receiver only processes when services is non-empty).
let current_ip = *sender_ctx.lan_ip.lock().unwrap();
if let Ok(pkt) =
build_announcement(&sender_hostname, current_ip, &services, &sender_instance_id)
{
if let Ok(pkt) = build_announcement(
&sender_hostname,
current_ip,
&services,
&sender_instance_id,
&sender_ctx.health_meta,
) {
let _ = sender_socket.send_to(pkt.filled(), dest).await;
}
}
@@ -240,6 +264,7 @@ fn build_announcement(
ip: Ipv4Addr,
services: &[(String, u16)],
inst_id: &str,
meta: &HealthMeta,
) -> crate::Result<BytePacketBuffer> {
let mut buf = BytePacketBuffer::new();
let instance_name = format!("{}._numa._tcp.local", hostname);
@@ -260,7 +285,11 @@ fn build_announcement(
patch_rdlen(&mut buf, rdlen_pos, rdata_start)?;
// SRV: <instance>._numa._tcp.local → <hostname>.local
// Port in SRV is informational; actual service ports are in TXT
// Port = mobile API port, which is what the iOS companion app resolves
// the SRV record for. Legacy Numa peers don't read the SRV port (see
// parse_mdns_response — it only uses TXT services= for peer discovery),
// so changing the SRV port from "first service's port" to the mobile
// API port is backwards compatible.
write_record_header(
&mut buf,
&instance_name,
@@ -273,11 +302,13 @@ fn build_announcement(
let rdata_start = buf.pos();
buf.write_u16(0)?; // priority
buf.write_u16(0)?; // weight
buf.write_u16(services.first().map(|(_, p)| *p).unwrap_or(0))?; // first service port for SRV display
buf.write_u16(meta.api_port)?; // mobile API port, for iOS companion app
buf.write_qname(&host_local)?;
patch_rdlen(&mut buf, rdlen_pos, rdata_start)?;
// TXT: services + instance ID for self-filtering
// TXT: legacy peer-discovery entries (services, id) + enriched entries
// for the iOS companion app (version, api_port, proto, dot_port, ca_fp).
// All in one TXT RRset per mDNS convention.
write_record_header(
&mut buf,
&instance_name,
@@ -293,8 +324,21 @@ fn build_announcement(
.map(|(name, port)| format!("{}:{}", name, port))
.collect::<Vec<_>>()
.join(",");
write_txt_string(&mut buf, &format!("services={}", svc_str))?;
write_txt_string(&mut buf, &format!("id={}", inst_id))?;
// Legacy peer-discovery entries (consumed by parse_mdns_response)
write_txt_string(&mut buf, &format!("{}{}", TXT_SERVICES, svc_str))?;
write_txt_string(&mut buf, &format!("{}{}", TXT_ID, inst_id))?;
// Enriched entries (consumed by the iOS/Android companion apps)
write_txt_string(&mut buf, &format!("{}{}", TXT_VERSION, meta.version))?;
write_txt_string(&mut buf, &format!("{}{}", TXT_API_PORT, meta.api_port))?;
if meta.dot_enabled {
write_txt_string(&mut buf, &format!("{}dot", TXT_PROTO))?;
write_txt_string(&mut buf, &format!("{}{}", TXT_DOT_PORT, meta.dot_port))?;
} else {
write_txt_string(&mut buf, &format!("{}plain", TXT_PROTO))?;
}
if let Some(fp) = &meta.ca_fingerprint_sha256 {
write_txt_string(&mut buf, &format!("{}{}", TXT_CA_FP, fp))?;
}
patch_rdlen(&mut buf, rdlen_pos, rdata_start)?;
// A: <hostname>.local → IP
@@ -408,7 +452,7 @@ fn parse_mdns_response(data: &[u8]) -> Option<MdnsAnnouncement> {
break;
}
if let Ok(txt) = std::str::from_utf8(&data[pos..pos + txt_len]) {
if let Some(val) = txt.strip_prefix("services=") {
if let Some(val) = txt.strip_prefix(TXT_SERVICES) {
let svcs: Vec<(String, u16)> = val
.split(',')
.filter_map(|s| {
@@ -421,7 +465,7 @@ fn parse_mdns_response(data: &[u8]) -> Option<MdnsAnnouncement> {
if !svcs.is_empty() {
txt_services = Some(svcs);
}
} else if let Some(id) = txt.strip_prefix("id=") {
} else if let Some(id) = txt.strip_prefix(TXT_ID) {
peer_instance_id = Some(id.to_string());
}
}

View File

@@ -5,10 +5,15 @@ pub mod cache;
pub mod config;
pub mod ctx;
pub mod dnssec;
pub mod doh;
pub mod dot;
pub mod forward;
pub mod header;
pub mod health;
pub mod lan;
pub mod mobile_api;
pub mod mobileconfig;
pub mod odoh;
pub mod override_store;
pub mod packet;
pub mod proxy;
@@ -16,28 +21,94 @@ pub mod query_log;
pub mod question;
pub mod record;
pub mod recursive;
pub mod relay;
pub mod serve;
pub mod service_store;
pub mod setup_phone;
pub mod srtt;
pub mod stats;
pub mod svcb;
pub mod system_dns;
pub mod tls;
pub mod wire;
#[cfg(windows)]
pub mod windows_service;
#[cfg(test)]
pub(crate) mod testutil;
pub type Error = Box<dyn std::error::Error + Send + Sync>;
pub type Result<T> = std::result::Result<T, Error>;
/// Build version string. On tagged releases: `0.13.1`. On commits ahead
/// of a tag: `0.13.1+a87f907`. With uncommitted changes: `0.13.1+a87f907-dirty`.
/// Falls back to `CARGO_PKG_VERSION` when built outside a git repo (e.g.
/// from a source tarball).
pub fn version() -> &'static str {
option_env!("NUMA_BUILD_VERSION").unwrap_or(env!("CARGO_PKG_VERSION"))
}
/// Detect the machine hostname via the `hostname` command. Returns the
/// full hostname (e.g., `macbook-pro.local`), or `"numa"` if the command
/// fails. Call sites that need the short form (e.g., mDNS instance
/// names) should truncate at the first `.`.
pub fn hostname() -> String {
std::process::Command::new("hostname")
.output()
.ok()
.and_then(|o| String::from_utf8(o.stdout).ok())
.map(|h| h.trim().to_string())
.filter(|h| !h.is_empty())
.unwrap_or_else(|| "numa".to_string())
}
/// Path to suggest to an interactive user when asking them to create
/// `numa.toml`. Prefers `$HOME/.config/numa/numa.toml` when HOME is set
/// (actionable without sudo); falls back to `config_dir()` otherwise.
///
/// Note: `config_dir()` routes interactive root to FHS (`/var/lib/numa`)
/// so that runtime state like `services.json` stays continuous with the
/// installed daemon. This helper exists specifically to give advisories
/// and `load_config` an XDG-aware path for user-authored config, without
/// moving runtime state out of FHS — see issue #81.
pub(crate) fn suggested_config_path() -> std::path::PathBuf {
#[cfg(not(windows))]
{
resolve_suggested_config_path(std::env::var("HOME").ok().as_deref(), config_dir)
}
#[cfg(windows)]
{
config_dir().join("numa.toml")
}
}
#[cfg(not(windows))]
fn resolve_suggested_config_path<F>(home: Option<&str>, fallback_dir: F) -> std::path::PathBuf
where
F: FnOnce() -> std::path::PathBuf,
{
if let Some(home) = home {
if !home.is_empty() && home != "/" {
return std::path::PathBuf::from(home)
.join(".config")
.join("numa")
.join("numa.toml");
}
}
fallback_dir().join("numa.toml")
}
/// Shared config directory for persistent data (services.json, etc).
/// Unix users: ~/.config/numa/
/// Linux root daemon: /var/lib/numa (FHS) — falls back to /usr/local/var/numa
/// if a pre-v0.10.1 install already lives there.
/// macOS root daemon: /usr/local/var/numa (Homebrew prefix)
/// Windows: %APPDATA%\numa
/// Windows: %PROGRAMDATA%\numa (same as data_dir — no per-user config on Windows)
pub fn config_dir() -> std::path::PathBuf {
#[cfg(windows)]
{
std::path::PathBuf::from(
std::env::var("APPDATA").unwrap_or_else(|_| "C:\\ProgramData".into()),
)
.join("numa")
data_dir()
}
#[cfg(not(windows))]
{
@@ -144,4 +215,73 @@ mod tests {
fn linux_data_dir_only_fhs_uses_fhs() {
assert_eq!(resolve_linux_data_dir(false, true), "/var/lib/numa");
}
#[cfg(not(windows))]
fn fhs() -> std::path::PathBuf {
std::path::PathBuf::from("/var/lib/numa")
}
#[cfg(not(windows))]
#[test]
fn suggested_config_path_prefers_home() {
assert_eq!(
resolve_suggested_config_path(Some("/home/alice"), fhs),
std::path::PathBuf::from("/home/alice/.config/numa/numa.toml"),
);
}
#[cfg(not(windows))]
#[test]
fn suggested_config_path_prefers_root_home_over_fhs() {
// Interactive root: HOME=/root is a real user context, not a daemon signal.
// Advisory must point where load_config will actually look — issue #81.
assert_eq!(
resolve_suggested_config_path(Some("/root"), fhs),
std::path::PathBuf::from("/root/.config/numa/numa.toml"),
);
}
#[cfg(not(windows))]
#[test]
fn suggested_config_path_falls_back_when_home_unset() {
assert_eq!(
resolve_suggested_config_path(None, fhs),
std::path::PathBuf::from("/var/lib/numa/numa.toml"),
);
}
#[cfg(not(windows))]
#[test]
fn suggested_config_path_falls_back_when_home_is_root() {
// systemd services sometimes have HOME=/ — don't treat that as a real home.
assert_eq!(
resolve_suggested_config_path(Some("/"), fhs),
std::path::PathBuf::from("/var/lib/numa/numa.toml"),
);
}
#[cfg(not(windows))]
#[test]
fn suggested_config_path_falls_back_when_home_is_empty() {
assert_eq!(
resolve_suggested_config_path(Some(""), fhs),
std::path::PathBuf::from("/var/lib/numa/numa.toml"),
);
}
#[cfg(not(windows))]
#[test]
fn suggested_config_path_skips_fallback_when_home_valid() {
// Happy path shouldn't probe the filesystem via config_dir().
let called = std::cell::Cell::new(false);
let fallback = || {
called.set(true);
std::path::PathBuf::from("/should/not/be/used")
};
let _ = resolve_suggested_config_path(Some("/home/alice"), fallback);
assert!(
!called.get(),
"fallback must not be invoked when HOME is valid"
);
}
}

View File

@@ -1,36 +1,34 @@
use std::net::SocketAddr;
use std::sync::{Arc, Mutex, RwLock};
use std::time::Duration;
use arc_swap::ArcSwap;
use log::{error, info};
use tokio::net::UdpSocket;
use numa::blocklist::{download_blocklists, parse_blocklist, BlocklistStore};
use numa::buffer::BytePacketBuffer;
use numa::cache::DnsCache;
use numa::config::{build_zone_map, load_config, ConfigLoad};
use numa::ctx::{handle_query, ServerCtx};
use numa::forward::Upstream;
use numa::override_store::OverrideStore;
use numa::query_log::QueryLog;
use numa::service_store::ServiceStore;
use numa::stats::ServerStats;
use numa::system_dns::{
discover_system_dns, install_service, restart_service, service_status, uninstall_service,
install_service, restart_service, service_status, start_service, stop_service,
uninstall_service,
};
const QUAD9_IP: &str = "9.9.9.9";
const DOH_FALLBACK: &str = "https://9.9.9.9/dns-query";
fn main() -> numa::Result<()> {
// Handle CLI subcommands
let arg1 = std::env::args().nth(1).unwrap_or_default();
#[cfg(windows)]
if arg1 == "--service" {
// Running under SCM — stderr goes nowhere. Redirect logs to a file.
let log_path = numa::data_dir().join("numa.log");
let log_file = std::fs::OpenOptions::new()
.create(true)
.append(true)
.open(&log_path)
.expect("failed to open log file");
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info"))
.format_timestamp_millis()
.target(env_logger::Target::Pipe(Box::new(log_file)))
.init();
numa::windows_service::run_as_service()
.map_err(|e| format!("windows service dispatcher failed: {}", e))?;
return Ok(());
}
#[tokio::main]
async fn main() -> numa::Result<()> {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info"))
.format_timestamp_millis()
.init();
// Handle CLI subcommands
let arg1 = std::env::args().nth(1).unwrap_or_default();
match arg1.as_str() {
"install" => {
eprintln!("\x1b[1;38;2;192;98;58mNuma\x1b[0m — installing\n");
@@ -44,8 +42,8 @@ async fn main() -> numa::Result<()> {
let sub = std::env::args().nth(2).unwrap_or_default();
eprintln!("\x1b[1;38;2;192;98;58mNuma\x1b[0m — service management\n");
return match sub.as_str() {
"start" => install_service().map_err(|e| e.into()),
"stop" => uninstall_service().map_err(|e| e.into()),
"start" => start_service().map_err(|e| e.into()),
"stop" => stop_service().map_err(|e| e.into()),
"restart" => restart_service().map_err(|e| e.into()),
"status" => service_status().map_err(|e| e.into()),
_ => {
@@ -54,6 +52,40 @@ async fn main() -> numa::Result<()> {
}
};
}
"setup-phone" => {
let runtime = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()?;
return runtime
.block_on(numa::setup_phone::run())
.map_err(|e| e.into());
}
"relay" => {
let port: u16 = std::env::args()
.nth(2)
.as_deref()
.and_then(|s| s.parse().ok())
.unwrap_or(8443);
let bind: std::net::IpAddr = std::env::args()
.nth(3)
.as_deref()
.map(|s| {
s.parse().unwrap_or_else(|e| {
eprintln!("invalid bind address '{}': {}", s, e);
std::process::exit(1);
})
})
.unwrap_or(std::net::IpAddr::V4(std::net::Ipv4Addr::LOCALHOST));
let addr = std::net::SocketAddr::new(bind, port);
eprintln!(
"\x1b[1;38;2;192;98;58mNuma\x1b[0m — ODoH relay on {}\n",
addr
);
let runtime = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()?;
return runtime.block_on(numa::relay::run(addr));
}
"lan" => {
let sub = std::env::args().nth(2).unwrap_or_default();
let config_path = std::env::args()
@@ -85,12 +117,29 @@ async fn main() -> numa::Result<()> {
eprintln!(" service status Check if the service is running");
eprintln!(" lan on Enable LAN service discovery (mDNS)");
eprintln!(" lan off Disable LAN service discovery");
eprintln!(" relay [PORT] [BIND]");
eprintln!(" Run as an ODoH relay (RFC 9230, default 127.0.0.1:8443)");
eprintln!(" setup-phone Generate a QR code to install Numa DoT on a phone");
eprintln!(" help Show this help");
eprintln!();
eprintln!("Config path defaults to numa.toml");
return Ok(());
}
_ => {}
_ => {
if !arg1.is_empty()
&& arg1 != "run"
&& !arg1.contains('/')
&& !arg1.contains('\\')
&& !arg1.ends_with(".toml")
{
eprintln!(
"\x1b[1;38;2;192;98;58mNuma\x1b[0m — unknown command: \x1b[1m{}\x1b[0m\n",
arg1
);
eprintln!("Run \x1b[1mnuma help\x1b[0m for a list of commands.");
std::process::exit(1);
}
}
}
let config_path = if arg1.is_empty() || arg1 == "run" {
@@ -100,504 +149,11 @@ async fn main() -> numa::Result<()> {
} else {
arg1 // treat as config path for backwards compatibility
};
let ConfigLoad {
config,
path: resolved_config_path,
found: config_found,
} = load_config(&config_path)?;
// Discover system DNS in a single pass (upstream + forwarding rules)
let system_dns = discover_system_dns();
let root_hints = numa::recursive::parse_root_hints(&config.upstream.root_hints);
let (resolved_mode, upstream_auto, upstream, upstream_label) = match config.upstream.mode {
numa::config::UpstreamMode::Auto => {
info!("auto mode: probing recursive resolution...");
if numa::recursive::probe_recursive(&root_hints).await {
info!("recursive probe succeeded — self-sovereign mode");
let dummy = Upstream::Udp("0.0.0.0:0".parse().unwrap());
(
numa::config::UpstreamMode::Recursive,
false,
dummy,
"recursive (root hints)".to_string(),
)
} else {
log::warn!("recursive probe failed — falling back to Quad9 DoH");
let client = reqwest::Client::builder()
.use_rustls_tls()
.build()
.unwrap_or_default();
let url = DOH_FALLBACK.to_string();
let label = url.clone();
(
numa::config::UpstreamMode::Forward,
false,
Upstream::Doh { url, client },
label,
)
}
}
numa::config::UpstreamMode::Recursive => {
let dummy = Upstream::Udp("0.0.0.0:0".parse().unwrap());
(
numa::config::UpstreamMode::Recursive,
false,
dummy,
"recursive (root hints)".to_string(),
)
}
numa::config::UpstreamMode::Forward => {
let upstream_addr = if config.upstream.address.is_empty() {
system_dns
.default_upstream
.or_else(numa::system_dns::detect_dhcp_dns)
.unwrap_or_else(|| {
info!("could not detect system DNS, falling back to Quad9 DoH");
DOH_FALLBACK.to_string()
})
} else {
config.upstream.address.clone()
};
let upstream: Upstream = if upstream_addr.starts_with("https://") {
let client = reqwest::Client::builder()
.use_rustls_tls()
.build()
.unwrap_or_default();
Upstream::Doh {
url: upstream_addr,
client,
}
} else {
let addr: SocketAddr =
format!("{}:{}", upstream_addr, config.upstream.port).parse()?;
Upstream::Udp(addr)
};
let label = upstream.to_string();
(
numa::config::UpstreamMode::Forward,
config.upstream.address.is_empty(),
upstream,
label,
)
}
};
let api_port = config.server.api_port;
let mut blocklist = BlocklistStore::new();
for domain in &config.blocking.allowlist {
blocklist.add_to_allowlist(domain);
}
if !config.blocking.enabled {
blocklist.set_enabled(false);
}
// Build service store: config services + persisted user services
let mut service_store = ServiceStore::new();
service_store.insert_from_config("numa", config.server.api_port, Vec::new());
for svc in &config.services {
service_store.insert_from_config(&svc.name, svc.target_port, svc.routes.clone());
}
service_store.load_persisted();
let forwarding_rules = system_dns.forwarding_rules;
// Resolve data_dir from config, falling back to the platform default.
// Used for TLS CA storage below and stored on ServerCtx for runtime use.
let resolved_data_dir = config
.server
.data_dir
.clone()
.unwrap_or_else(numa::data_dir);
// Build initial TLS config before ServerCtx (so ArcSwap is ready at construction)
let initial_tls = if config.proxy.enabled && config.proxy.tls_port > 0 {
let service_names = service_store.names();
match numa::tls::build_tls_config(
&config.proxy.tld,
&service_names,
Vec::new(),
&resolved_data_dir,
) {
Ok(tls_config) => Some(ArcSwap::from(tls_config)),
Err(e) => {
if let Some(advisory) = numa::tls::try_data_dir_advisory(&e, &resolved_data_dir) {
eprint!("{}", advisory);
} else {
log::warn!("TLS setup failed, HTTPS proxy disabled: {}", e);
}
None
}
}
} else {
None
};
let socket = match UdpSocket::bind(&config.server.bind_addr).await {
Ok(s) => s,
Err(e) => {
if let Some(advisory) =
numa::system_dns::try_port53_advisory(&config.server.bind_addr, &e)
{
eprint!("{}", advisory);
std::process::exit(1);
}
return Err(e.into());
}
};
let ctx = Arc::new(ServerCtx {
socket,
zone_map: build_zone_map(&config.zones)?,
cache: RwLock::new(DnsCache::new(
config.cache.max_entries,
config.cache.min_ttl,
config.cache.max_ttl,
)),
stats: Mutex::new(ServerStats::new()),
overrides: RwLock::new(OverrideStore::new()),
blocklist: RwLock::new(blocklist),
query_log: Mutex::new(QueryLog::new(1000)),
services: Mutex::new(service_store),
lan_peers: Mutex::new(numa::lan::PeerStore::new(config.lan.peer_timeout_secs)),
forwarding_rules,
upstream: Mutex::new(upstream),
upstream_auto,
upstream_port: config.upstream.port,
lan_ip: Mutex::new(numa::lan::detect_lan_ip().unwrap_or(std::net::Ipv4Addr::LOCALHOST)),
timeout: Duration::from_millis(config.upstream.timeout_ms),
proxy_tld_suffix: if config.proxy.tld.is_empty() {
String::new()
} else {
format!(".{}", config.proxy.tld)
},
proxy_tld: config.proxy.tld.clone(),
lan_enabled: config.lan.enabled,
config_path: resolved_config_path,
config_found,
config_dir: numa::config_dir(),
data_dir: resolved_data_dir,
tls_config: initial_tls,
upstream_mode: resolved_mode,
root_hints,
srtt: std::sync::RwLock::new(numa::srtt::SrttCache::new(config.upstream.srtt)),
inflight: std::sync::Mutex::new(std::collections::HashMap::new()),
dnssec_enabled: config.dnssec.enabled,
dnssec_strict: config.dnssec.strict,
});
let zone_count: usize = ctx.zone_map.values().map(|m| m.len()).sum();
// Build banner rows, then size the box to fit the longest value
let api_url = format!("http://localhost:{}", api_port);
let proxy_label = if config.proxy.enabled {
if config.proxy.tls_port > 0 {
Some(format!(
"http://:{} https://:{}",
config.proxy.port, config.proxy.tls_port
))
} else {
Some(format!(
"http://*.{} on :{}",
config.proxy.tld, config.proxy.port
))
}
} else {
None
};
let config_label = if ctx.config_found {
ctx.config_path.clone()
} else {
format!("{} (defaults)", ctx.config_path)
};
let data_label = ctx.data_dir.display().to_string();
let services_label = ctx.config_dir.join("services.json").display().to_string();
// label (10) + value + padding (2) = inner width; minimum 40 for the title row
let val_w = [
config.server.bind_addr.len(),
api_url.len(),
upstream_label.len(),
config_label.len(),
data_label.len(),
services_label.len(),
]
.into_iter()
.chain(proxy_label.as_ref().map(|s| s.len()))
.max()
.unwrap_or(30);
let w = (val_w + 12).max(42); // 10 label + 2 padding, min 42 for title
let o = "\x1b[38;2;192;98;58m"; // orange
let g = "\x1b[38;2;107;124;78m"; // green
let d = "\x1b[38;2;163;152;136m"; // dim
let r = "\x1b[0m"; // reset
let b = "\x1b[1;38;2;192;98;58m"; // bold orange
let it = "\x1b[3;38;2;163;152;136m"; // italic dim
let bar_top = "".repeat(w);
let bar_mid = "".repeat(w);
let row = |label: &str, color: &str, value: &str| {
eprintln!(
"{o}{r} {color}{:<9}{r} {:<vw$}{o}{r}",
label,
value,
vw = w - 12
);
};
// Title row: center within the box
let title = format!(
"{b}NUMA{r} {it}DNS that governs itself{r} {d}v{}{r}",
env!("CARGO_PKG_VERSION")
);
// The title contains ANSI codes; visible length is ~38 chars. Pad to fill the box.
let title_visible_len = 4 + 2 + 24 + 2 + 1 + env!("CARGO_PKG_VERSION").len() + 1;
let title_pad = w.saturating_sub(title_visible_len);
eprintln!("\n{o}{bar_top}{r}");
eprint!("{o}{r} {title}");
eprintln!("{}{o}{r}", " ".repeat(title_pad));
eprintln!("{o}{bar_top}{r}");
row("DNS", g, &config.server.bind_addr);
row("API", g, &api_url);
row("Dashboard", g, &api_url);
row(
"Upstream",
g,
if ctx.upstream_mode == numa::config::UpstreamMode::Recursive {
"recursive (root hints)"
} else {
&upstream_label
},
);
row("Zones", g, &format!("{} records", zone_count));
row(
"Cache",
g,
&format!("max {} entries", config.cache.max_entries),
);
row(
"Blocking",
g,
&if config.blocking.enabled {
format!("{} lists", config.blocking.lists.len())
} else {
"disabled".to_string()
},
);
if let Some(ref label) = proxy_label {
row("Proxy", g, label);
if config.proxy.bind_addr == "127.0.0.1" {
let y = "\x1b[38;2;204;176;59m"; // yellow
row(
"",
y,
&format!(
"⚠ proxy on 127.0.0.1 — .{} not LAN reachable",
config.proxy.tld
),
);
}
}
if config.dot.enabled {
row("DoT", g, &format!("tls://:{}", config.dot.port));
}
if config.lan.enabled {
row("LAN", g, "mDNS (_numa._tcp.local)");
}
if !ctx.forwarding_rules.is_empty() {
row(
"Routing",
g,
&format!("{} conditional rules", ctx.forwarding_rules.len()),
);
}
eprintln!("{o}{bar_mid}{r}");
row("Config", d, &config_label);
row("Data", d, &data_label);
row("Services", d, &services_label);
eprintln!("{o}{bar_top}{r}\n");
info!(
"numa listening on {}, upstream {}, {} zone records, cache max {}, API on port {}",
config.server.bind_addr, upstream_label, zone_count, config.cache.max_entries, api_port,
);
// Download blocklists on startup
let blocklist_lists = config.blocking.lists.clone();
let refresh_hours = config.blocking.refresh_hours;
if config.blocking.enabled && !blocklist_lists.is_empty() {
let bl_ctx = Arc::clone(&ctx);
let bl_lists = blocklist_lists.clone();
tokio::spawn(async move {
load_blocklists(&bl_ctx, &bl_lists).await;
// Periodic refresh
let mut interval = tokio::time::interval(Duration::from_secs(refresh_hours * 3600));
interval.tick().await; // skip immediate tick
loop {
interval.tick().await;
info!("refreshing blocklists...");
load_blocklists(&bl_ctx, &bl_lists).await;
}
});
}
// Prime TLD cache (recursive mode only)
if ctx.upstream_mode == numa::config::UpstreamMode::Recursive {
let prime_ctx = Arc::clone(&ctx);
let prime_tlds = config.upstream.prime_tlds;
tokio::spawn(async move {
numa::recursive::prime_tld_cache(
&prime_ctx.cache,
&prime_ctx.root_hints,
&prime_tlds,
&prime_ctx.srtt,
)
.await;
});
}
// Spawn HTTP API server
let api_ctx = Arc::clone(&ctx);
let api_addr: SocketAddr = format!("{}:{}", config.server.api_bind_addr, api_port).parse()?;
tokio::spawn(async move {
let app = numa::api::router(api_ctx);
let listener = tokio::net::TcpListener::bind(api_addr).await.unwrap();
info!("HTTP API listening on {}", api_addr);
axum::serve(listener, app).await.unwrap();
});
let proxy_bind: std::net::Ipv4Addr = config
.proxy
.bind_addr
.parse()
.unwrap_or(std::net::Ipv4Addr::LOCALHOST);
// Spawn HTTP reverse proxy for .numa domains
if config.proxy.enabled {
let proxy_ctx = Arc::clone(&ctx);
let proxy_port = config.proxy.port;
tokio::spawn(async move {
numa::proxy::start_proxy(proxy_ctx, proxy_port, proxy_bind).await;
});
}
// Spawn HTTPS reverse proxy with TLS termination
if config.proxy.enabled && config.proxy.tls_port > 0 && ctx.tls_config.is_some() {
let proxy_ctx = Arc::clone(&ctx);
let tls_port = config.proxy.tls_port;
tokio::spawn(async move {
numa::proxy::start_proxy_tls(proxy_ctx, tls_port, proxy_bind).await;
});
}
// Spawn network change watcher (upstream re-detection, LAN IP update, peer flush)
{
let watch_ctx = Arc::clone(&ctx);
tokio::spawn(async move {
network_watch_loop(watch_ctx).await;
});
}
// Spawn LAN service discovery
if config.lan.enabled {
let lan_ctx = Arc::clone(&ctx);
let lan_config = config.lan.clone();
tokio::spawn(async move {
numa::lan::start_lan_discovery(lan_ctx, &lan_config).await;
});
}
// Spawn DNS-over-TLS listener (RFC 7858)
if config.dot.enabled {
let dot_ctx = Arc::clone(&ctx);
let dot_config = config.dot.clone();
tokio::spawn(async move {
numa::dot::start_dot(dot_ctx, &dot_config).await;
});
}
// UDP DNS listener
#[allow(clippy::infinite_loop)]
loop {
let mut buffer = BytePacketBuffer::new();
let (_, src_addr) = match ctx.socket.recv_from(&mut buffer.buf).await {
Ok(r) => r,
Err(e) if e.kind() == std::io::ErrorKind::ConnectionReset => {
// Windows delivers ICMP port-unreachable as ConnectionReset on UDP sockets
continue;
}
Err(e) => return Err(e.into()),
};
let ctx = Arc::clone(&ctx);
tokio::spawn(async move {
if let Err(e) = handle_query(buffer, src_addr, &ctx).await {
error!("{} | HANDLER ERROR | {}", src_addr, e);
}
});
}
}
async fn network_watch_loop(ctx: Arc<numa::ctx::ServerCtx>) {
let mut tick: u64 = 0;
let mut interval = tokio::time::interval(Duration::from_secs(5));
interval.tick().await; // skip immediate tick
loop {
interval.tick().await;
tick += 1;
let mut changed = false;
// Check LAN IP change (every 5s — cheap, one UDP socket call)
if let Some(new_ip) = numa::lan::detect_lan_ip() {
let mut current_ip = ctx.lan_ip.lock().unwrap();
if new_ip != *current_ip {
info!("LAN IP changed: {} → {}", current_ip, new_ip);
*current_ip = new_ip;
changed = true;
numa::recursive::reset_udp_state();
}
}
// Re-detect upstream every 30s or on LAN IP change (UDP only —
// DoH upstreams are explicitly configured via URL, not auto-detected)
if ctx.upstream_auto
&& matches!(*ctx.upstream.lock().unwrap(), Upstream::Udp(_))
&& (changed || tick.is_multiple_of(6))
{
let dns_info = numa::system_dns::discover_system_dns();
let new_addr = dns_info
.default_upstream
.or_else(numa::system_dns::detect_dhcp_dns)
.unwrap_or_else(|| QUAD9_IP.to_string());
if let Ok(new_sock) =
format!("{}:{}", new_addr, ctx.upstream_port).parse::<SocketAddr>()
{
let new_upstream = Upstream::Udp(new_sock);
let mut upstream = ctx.upstream.lock().unwrap();
if *upstream != new_upstream {
info!("upstream changed: {} → {}", upstream, new_upstream);
*upstream = new_upstream;
changed = true;
}
}
}
// Flush stale LAN peers on any network change
if changed {
ctx.lan_peers.lock().unwrap().clear();
info!("flushed LAN peers after network change");
}
// Re-probe UDP every 5 minutes when disabled
if tick.is_multiple_of(60) {
numa::recursive::probe_udp(&ctx.root_hints).await;
}
}
let runtime = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()?;
runtime.block_on(numa::serve::run(config_path))
}
fn set_lan_enabled(enabled: bool, path: &str) -> numa::Result<()> {
@@ -664,29 +220,3 @@ fn print_lan_status(enabled: bool) {
eprintln!(" Restart Numa to start mDNS discovery");
}
}
async fn load_blocklists(ctx: &ServerCtx, lists: &[String]) {
let downloaded = download_blocklists(lists).await;
// Parse outside the lock to avoid blocking DNS queries during parse (~100ms)
let mut all_domains = std::collections::HashSet::new();
let mut sources = Vec::new();
for (source, text) in &downloaded {
let domains = parse_blocklist(text);
info!("blocklist: {} domains from {}", domains.len(), source);
all_domains.extend(domains);
sources.push(source.clone());
}
let total = all_domains.len();
// Swap under lock — sub-microsecond
ctx.blocklist
.write()
.unwrap()
.swap_domains(all_domains, sources);
info!(
"blocking enabled: {} unique domains from {} lists",
total,
downloaded.len()
);
}

107
src/mobile_api.rs Normal file
View File

@@ -0,0 +1,107 @@
//! Mobile API — persistent HTTP listener for iOS/Android companion apps.
//!
//! Read-only subset of Numa's HTTP surface served on a separate port
//! (default 8765) bound to the LAN. Unlike the main API on port 5380
//! (which defaults to `127.0.0.1` and serves mutating routes like
//! `DELETE /services/{name}` or `PUT /blocking/toggle`), this listener
//! is safe to expose on the LAN because every route is idempotent and
//! read-only.
//!
//! Routes (all GET):
//!
//! - `/health` — enriched status + metadata, shares the handler with the
//! main API via `crate::api::health`
//! - `/ca.pem` — Numa local CA in PEM form, shares the handler with the
//! main API via `crate::api::serve_ca`
//! - `/mobileconfig` — combined CA + DNS settings profile (Full mode)
//! - `/ca.mobileconfig` — CA-only trust profile (no DNS override)
//!
//! The mobile API does NOT include the mutating routes (overrides, cache
//! flush, blocking toggle, service CRUD, etc.). Even if a user sets
//! `api_bind_addr` to `0.0.0.0` for the main API, those routes stay on
//! port 5380; the mobile API on port 8765 never serves them. This is the
//! primary security boundary: anything exposed to the LAN is read-only.
use std::net::Ipv4Addr;
use std::sync::Arc;
use axum::extract::State;
use axum::http::{header, StatusCode};
use axum::response::IntoResponse;
use axum::routing::get;
use axum::Router;
use log::info;
use crate::ctx::ServerCtx;
use crate::mobileconfig::{build_mobileconfig, ProfileMode};
/// Content-Disposition for the full CA + DNS profile download.
const FULL_PROFILE_DISPOSITION: &str = "attachment; filename=\"numa.mobileconfig\"";
/// Content-Disposition for the CA-only profile download.
const CA_ONLY_PROFILE_DISPOSITION: &str = "attachment; filename=\"numa-ca.mobileconfig\"";
/// Build the axum router for the mobile API.
///
/// Shares handler functions with the main API where possible (`health`,
/// `serve_ca`) so the response shapes are identical across both ports.
pub fn router(ctx: Arc<ServerCtx>) -> Router {
Router::new()
.route("/health", get(crate::api::health))
.route("/ca.pem", get(crate::api::serve_ca))
.route("/mobileconfig", get(serve_full_mobileconfig))
.route("/ca.mobileconfig", get(serve_ca_only_mobileconfig))
.with_state(ctx)
}
/// Start the mobile API listener on `bind_addr:port`. Runs until the
/// caller cancels the spawned task. Logs the URL on successful bind.
pub async fn start(ctx: Arc<ServerCtx>, bind_addr: String, port: u16) -> crate::Result<()> {
let addr: std::net::SocketAddr = format!("{}:{}", bind_addr, port).parse()?;
let listener = tokio::net::TcpListener::bind(addr).await?;
info!("Mobile API listening on http://{}", addr);
let app = router(ctx);
axum::serve(listener, app).await?;
Ok(())
}
/// Serve the full mobileconfig profile (CA + DNS settings), with the
/// DNS payload pointing at the current LAN IP. Each request reads the
/// fresh LAN IP from `ctx.lan_ip` so the profile always reflects the
/// laptop's current network state.
async fn serve_full_mobileconfig(
State(ctx): State<Arc<ServerCtx>>,
) -> Result<impl IntoResponse, StatusCode> {
let ca_pem = ctx.ca_pem.as_deref().ok_or(StatusCode::NOT_FOUND)?;
let lan_ip: Ipv4Addr = *ctx.lan_ip.lock().unwrap();
let profile = build_mobileconfig(ProfileMode::Full { lan_ip }, ca_pem);
Ok(profile_response(profile, FULL_PROFILE_DISPOSITION))
}
/// Serve the CA-only mobileconfig profile. Trusts the Numa local CA but
/// does NOT change the device's DNS settings. Used by the iOS companion
/// app's DoT mode, where the app configures DNS via `NEDNSSettingsManager`
/// and only needs the system trust store to accept Numa's self-signed cert.
async fn serve_ca_only_mobileconfig(
State(ctx): State<Arc<ServerCtx>>,
) -> Result<impl IntoResponse, StatusCode> {
let ca_pem = ctx.ca_pem.as_deref().ok_or(StatusCode::NOT_FOUND)?;
let profile = build_mobileconfig(ProfileMode::CaOnly, ca_pem);
Ok(profile_response(profile, CA_ONLY_PROFILE_DISPOSITION))
}
/// Shared response constructor for both mobileconfig variants.
/// Identical headers; only the Content-Disposition filename differs.
fn profile_response(profile: String, disposition: &'static str) -> impl IntoResponse {
(
[
(header::CONTENT_TYPE, "application/x-apple-aspen-config"),
(header::CONTENT_DISPOSITION, disposition),
(header::CACHE_CONTROL, "no-store"),
],
profile,
)
}

305
src/mobileconfig.rs Normal file
View File

@@ -0,0 +1,305 @@
//! Apple `.mobileconfig` profile generator.
//!
//! Builds iOS Configuration Profiles that Numa serves to phones for one-tap
//! CA trust and DNS-over-TLS setup. The plist structure is hand-rendered
//! via `format!` — no plist crate dependency, deterministic output, small
//! binary footprint.
//!
//! Two modes:
//!
//! - [`ProfileMode::Full`]: CA trust payload + DNS settings payload pointing
//! at a specific LAN IP over DoT. This is what `numa setup-phone` has
//! always produced — the user scans a QR, installs this profile, and the
//! phone is configured for DoT through Numa in a single step (after the
//! iOS Certificate Trust Settings toggle, which is a separate system
//! gate we can't bypass).
//!
//! - [`ProfileMode::CaOnly`]: CA trust payload only, no DNS settings. Used
//! by the future iOS companion app flow where `NEDNSSettingsManager`
//! configures DNS programmatically and we only need the system trust
//! store to accept Numa's DoT cert. Installing this profile does NOT
//! change the user's DNS at all.
//!
//! Payload identifiers and UUIDs are fixed (not randomized) so iOS replaces
//! the existing profile on re-install rather than accumulating duplicates.
//! The `Full` and `CaOnly` profiles have distinct top-level UUIDs so they
//! can coexist as separate installed profiles, but they share the same CA
//! payload UUID since the CA itself is the same trust anchor in both.
use std::net::Ipv4Addr;
/// Top-level UUID and PayloadIdentifier for the full profile (CA + DNS).
/// Changing this breaks in-place replacement on existing iOS installs.
const FULL_PROFILE_UUID: &str = "F1E2D3C4-B5A6-7890-1234-567890ABCDEF";
const FULL_PROFILE_ID: &str = "com.numa.dns.profile";
/// Top-level UUID and PayloadIdentifier for the CA-only profile.
/// Distinct from `FULL_PROFILE_UUID` so a user can install one, the other,
/// or both without the latest install silently replacing a different mode.
const CA_ONLY_PROFILE_UUID: &str = "F2E3D4C5-B6A7-8901-2345-67890ABCDEF0";
const CA_ONLY_PROFILE_ID: &str = "com.numa.dns.ca.profile";
/// CA trust payload UUID. Same in both modes — iOS will see "the same CA
/// trust anchor" regardless of which wrapping profile contains it.
const CA_PAYLOAD_UUID: &str = "B2C3D4E5-F6A7-8901-BCDE-F12345678901";
const CA_PAYLOAD_ID: &str = "com.numa.dns.ca";
/// DNS settings payload UUID (Full mode only).
const DNS_PAYLOAD_UUID: &str = "A1B2C3D4-E5F6-7890-ABCD-EF1234567890";
const DNS_PAYLOAD_ID: &str = "com.numa.dns.dot";
/// Profile mode determines which payloads are included in the generated
/// `.mobileconfig`.
#[derive(Debug, Clone)]
pub enum ProfileMode {
/// Full profile: CA trust anchor + managed DNS settings payload
/// pointing at the given LAN IP over DoT. This is what the classic
/// `numa setup-phone` QR flow serves.
Full { lan_ip: Ipv4Addr },
/// CA-only profile: just the trust anchor, no DNS settings. For use
/// with the iOS companion app which manages DNS programmatically via
/// `NEDNSSettingsManager` and only needs the system trust store to
/// accept Numa's self-signed DoT cert.
CaOnly,
}
/// Build a full `.mobileconfig` profile as an XML plist string.
pub fn build_mobileconfig(mode: ProfileMode, ca_pem: &str) -> String {
let ca_payload = build_ca_payload(ca_pem);
match mode {
ProfileMode::Full { lan_ip } => {
let dns_payload = build_dns_payload(lan_ip);
let payloads = format!("{}\n{}", ca_payload, dns_payload);
let description = format!(
"Trusts the Numa local CA and routes DNS queries to Numa over DoT on your local network ({lan_ip})"
);
wrap_plist(
&payloads,
FULL_PROFILE_UUID,
FULL_PROFILE_ID,
&description,
"Numa DNS",
)
}
ProfileMode::CaOnly => wrap_plist(
&ca_payload,
CA_ONLY_PROFILE_UUID,
CA_ONLY_PROFILE_ID,
"Trusts the Numa local Certificate Authority. Does not change your DNS settings.",
"Numa CA",
),
}
}
/// Strip the PEM header/footer and newlines from a CA cert, leaving raw
/// base64 for embedding in a plist `<data>` block.
fn pem_to_base64(pem: &str) -> String {
pem.lines()
.filter(|line| !line.starts_with("-----"))
.collect::<String>()
}
/// Wrap the base64 CA cert at 52 chars per line for plist readability
/// (matches Apple convention in hand-written profiles).
fn chunk_base64(base64: &str) -> String {
base64
.chars()
.collect::<Vec<_>>()
.chunks(52)
.map(|chunk| format!("\t\t\t{}", chunk.iter().collect::<String>()))
.collect::<Vec<_>>()
.join("\n")
}
/// Render the `com.apple.security.root` payload dict containing the CA cert.
fn build_ca_payload(ca_pem: &str) -> String {
let ca_wrapped = chunk_base64(&pem_to_base64(ca_pem));
format!(
r#" <dict>
<key>PayloadCertificateFileName</key>
<string>numa-ca.pem</string>
<key>PayloadContent</key>
<data>
{ca}
</data>
<key>PayloadDescription</key>
<string>Numa local Certificate Authority — required for DoT trust</string>
<key>PayloadDisplayName</key>
<string>Numa Local CA</string>
<key>PayloadIdentifier</key>
<string>{ca_id}</string>
<key>PayloadType</key>
<string>com.apple.security.root</string>
<key>PayloadUUID</key>
<string>{ca_uuid}</string>
<key>PayloadVersion</key>
<integer>1</integer>
</dict>"#,
ca = ca_wrapped,
ca_id = CA_PAYLOAD_ID,
ca_uuid = CA_PAYLOAD_UUID,
)
}
/// Render the `com.apple.dnsSettings.managed` payload dict for Full mode.
fn build_dns_payload(lan_ip: Ipv4Addr) -> String {
format!(
r#" <dict>
<key>DNSSettings</key>
<dict>
<key>DNSProtocol</key>
<string>TLS</string>
<key>ServerAddresses</key>
<array>
<string>{ip}</string>
</array>
<key>ServerName</key>
<string>numa.numa</string>
</dict>
<key>OnDemandRules</key>
<array>
<dict>
<key>Action</key>
<string>Connect</string>
<key>InterfaceTypeMatch</key>
<string>WiFi</string>
</dict>
<dict>
<key>Action</key>
<string>Disconnect</string>
</dict>
</array>
<key>PayloadDescription</key>
<string>Routes DNS queries through Numa over DoT when on Wi-Fi</string>
<key>PayloadDisplayName</key>
<string>Numa DNS-over-TLS</string>
<key>PayloadIdentifier</key>
<string>{dns_id}</string>
<key>PayloadType</key>
<string>com.apple.dnsSettings.managed</string>
<key>PayloadUUID</key>
<string>{dns_uuid}</string>
<key>PayloadVersion</key>
<integer>1</integer>
</dict>"#,
ip = lan_ip,
dns_id = DNS_PAYLOAD_ID,
dns_uuid = DNS_PAYLOAD_UUID,
)
}
/// Wrap one or more payload dicts in the top-level plist structure
/// with Configuration type, PayloadContent array, and profile metadata.
fn wrap_plist(
payloads: &str,
top_uuid: &str,
top_id: &str,
description: &str,
display_name: &str,
) -> String {
format!(
r#"<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>PayloadContent</key>
<array>
{payloads}
</array>
<key>PayloadDescription</key>
<string>{description}</string>
<key>PayloadDisplayName</key>
<string>{display_name}</string>
<key>PayloadIdentifier</key>
<string>{top_id}</string>
<key>PayloadRemovalDisallowed</key>
<false/>
<key>PayloadType</key>
<string>Configuration</string>
<key>PayloadUUID</key>
<string>{top_uuid}</string>
<key>PayloadVersion</key>
<integer>1</integer>
</dict>
</plist>
"#,
payloads = payloads,
description = description,
display_name = display_name,
top_id = top_id,
top_uuid = top_uuid,
)
}
#[cfg(test)]
mod tests {
use super::*;
const SAMPLE_PEM: &str =
"-----BEGIN CERTIFICATE-----\nMIIBkDCCATagAwIBAgIUTEST\n-----END CERTIFICATE-----\n";
#[test]
fn pem_to_base64_strips_headers() {
let pem = "-----BEGIN CERTIFICATE-----\nABCDEF\nGHIJKL\n-----END CERTIFICATE-----\n";
assert_eq!(pem_to_base64(pem), "ABCDEFGHIJKL");
}
#[test]
fn full_profile_contains_ip_and_ca() {
let config = build_mobileconfig(
ProfileMode::Full {
lan_ip: Ipv4Addr::new(192, 168, 1, 100),
},
SAMPLE_PEM,
);
assert!(config.contains("192.168.1.100"));
assert!(config.contains("MIIBkDCCATagAwIBAgIUTEST"));
assert!(config.contains("com.apple.security.root"));
assert!(config.contains("com.apple.dnsSettings.managed"));
assert!(config.contains("DNSProtocol"));
assert!(config.contains(FULL_PROFILE_UUID));
assert!(config.contains(FULL_PROFILE_ID));
}
#[test]
fn ca_only_profile_contains_ca_but_not_dns() {
let config = build_mobileconfig(ProfileMode::CaOnly, SAMPLE_PEM);
assert!(config.contains("MIIBkDCCATagAwIBAgIUTEST"));
assert!(config.contains("com.apple.security.root"));
assert!(!config.contains("com.apple.dnsSettings.managed"));
assert!(!config.contains("DNSProtocol"));
assert!(!config.contains("ServerAddresses"));
assert!(config.contains(CA_ONLY_PROFILE_UUID));
assert!(config.contains(CA_ONLY_PROFILE_ID));
}
#[test]
fn full_and_ca_only_have_distinct_top_uuids() {
let full = build_mobileconfig(
ProfileMode::Full {
lan_ip: Ipv4Addr::new(10, 0, 0, 1),
},
SAMPLE_PEM,
);
let ca_only = build_mobileconfig(ProfileMode::CaOnly, SAMPLE_PEM);
assert!(full.contains(FULL_PROFILE_UUID));
assert!(!full.contains(CA_ONLY_PROFILE_UUID));
assert!(ca_only.contains(CA_ONLY_PROFILE_UUID));
assert!(!ca_only.contains(FULL_PROFILE_UUID));
}
#[test]
fn both_modes_share_ca_payload_uuid() {
let full = build_mobileconfig(
ProfileMode::Full {
lan_ip: Ipv4Addr::new(10, 0, 0, 1),
},
SAMPLE_PEM,
);
let ca_only = build_mobileconfig(ProfileMode::CaOnly, SAMPLE_PEM);
assert!(full.contains(CA_PAYLOAD_UUID));
assert!(ca_only.contains(CA_PAYLOAD_UUID));
}
}

489
src/odoh.rs Normal file
View File

@@ -0,0 +1,489 @@
//! ODoH target-config fetcher and TTL cache (RFC 9230 §6).
//!
//! ## Ciphersuite policy
//! `odoh-rs` deserialization rejects any config whose KEM/KDF/AEAD triple is
//! not the mandatory `(X25519, HKDF-SHA256, AES-128-GCM)` (see
//! `ObliviousDoHConfigContents::deserialize`). This is stricter than the
//! plan's "pick the mandatory suite if mixed": a response containing *any*
//! non-mandatory config fails parse entirely. Real-world targets publish a
//! single mandatory config, so this is fine in practice; revisit if a target
//! that matters starts mixing suites.
use std::sync::Arc;
use std::time::{Duration, Instant};
use arc_swap::ArcSwapOption;
use odoh_rs::{
ObliviousDoHConfigContents, ObliviousDoHConfigs, ObliviousDoHMessage,
ObliviousDoHMessagePlaintext,
};
use rand_core::{OsRng, TryRngCore};
use reqwest::header::HeaderMap;
use tokio::sync::Mutex;
use tokio::time::timeout;
use crate::Result;
/// MIME type used for both directions of the ODoH exchange (RFC 9230 §4).
pub(crate) const ODOH_CONTENT_TYPE: &str = "application/oblivious-dns-message";
/// Cap on the response body we read into memory when the relay returns
/// non-success. Protects against a hostile relay streaming a huge body on
/// the error path; keeps enough room to carry a human-readable reason.
const ERROR_BODY_PREVIEW_BYTES: usize = 1024;
/// Fallback TTL when the target's response lacks a usable `Cache-Control`
/// directive. RFC 9230 §6.2 places no hard floor; 24 h matches what Cloudflare
/// publishes in practice.
const DEFAULT_CONFIG_TTL: Duration = Duration::from_secs(24 * 60 * 60);
/// Cap on any TTL we'll honour, regardless of what the target advertises.
/// Keeps a misconfigured server from pinning an old key indefinitely.
const MAX_CONFIG_TTL: Duration = Duration::from_secs(7 * 24 * 60 * 60);
/// After a failed `/.well-known/odohconfigs` fetch, refuse to refetch again
/// within this window — a target that is genuinely broken would otherwise
/// receive one request per query. Queries that arrive during the backoff
/// return the cached error immediately.
const REFRESH_BACKOFF: Duration = Duration::from_secs(60);
/// Parsed ODoH target config plus the freshness metadata needed to age it out.
#[derive(Debug)]
pub struct OdohTargetConfig {
pub contents: ObliviousDoHConfigContents,
pub key_id: Vec<u8>,
expires_at: Instant,
}
impl OdohTargetConfig {
pub fn is_expired(&self) -> bool {
Instant::now() >= self.expires_at
}
}
struct FailedRefresh {
at: Instant,
err: String,
}
/// TTL-gated cache of a single target's HPKE config.
///
/// Reads go through `ArcSwapOption` (lock-free hot path). Refreshes serialize
/// on an async mutex so a burst of simultaneous misses produces a single
/// outbound fetch, and a failed refresh blocks subsequent refetches for
/// [`REFRESH_BACKOFF`] to prevent hot-looping against a broken target.
pub struct OdohConfigCache {
target_host: String,
configs_url: String,
client: reqwest::Client,
current: ArcSwapOption<OdohTargetConfig>,
last_failure: ArcSwapOption<FailedRefresh>,
refresh_lock: Mutex<()>,
}
impl OdohConfigCache {
pub fn new(target_host: String, client: reqwest::Client) -> Self {
let configs_url = format!("https://{}/.well-known/odohconfigs", target_host);
Self {
target_host,
configs_url,
client,
current: ArcSwapOption::from(None),
last_failure: ArcSwapOption::from(None),
refresh_lock: Mutex::new(()),
}
}
pub fn target_host(&self) -> &str {
&self.target_host
}
/// Return a valid config, refetching when the cache is cold or expired.
/// Within [`REFRESH_BACKOFF`] of a failed refresh, returns the cached
/// error without issuing another fetch.
pub async fn get(&self) -> Result<Arc<OdohTargetConfig>> {
if let Some(cfg) = self.current.load_full() {
if !cfg.is_expired() {
return Ok(cfg);
}
}
if let Some(err) = self.backoff_error() {
return Err(err);
}
let _guard = self.refresh_lock.lock().await;
// Another task may have refreshed or failed while we waited.
if let Some(cfg) = self.current.load_full() {
if !cfg.is_expired() {
return Ok(cfg);
}
}
if let Some(err) = self.backoff_error() {
return Err(err);
}
match fetch_odoh_config(&self.client, &self.configs_url).await {
Ok(fresh) => {
let fresh = Arc::new(fresh);
self.current.store(Some(fresh.clone()));
self.last_failure.store(None);
Ok(fresh)
}
Err(e) => {
let msg = format!("ODoH config fetch failed: {e}");
self.last_failure.store(Some(Arc::new(FailedRefresh {
at: Instant::now(),
err: msg.clone(),
})));
Err(msg.into())
}
}
}
/// Drop the cached config. Called after the target rejects ciphertext
/// (key rotation race) so the next `get()` refetches.
pub fn invalidate(&self) {
self.current.store(None);
}
fn backoff_error(&self) -> Option<crate::Error> {
let fail = self.last_failure.load_full()?;
if fail.at.elapsed() < REFRESH_BACKOFF {
Some(format!("{} (backoff active)", fail.err).into())
} else {
None
}
}
}
/// Fetch `/.well-known/odohconfigs` from `configs_url` and parse it into an
/// [`OdohTargetConfig`]. The TTL is taken from the response's
/// `Cache-Control: max-age=`, clamped to [`DEFAULT_CONFIG_TTL`,
/// [`MAX_CONFIG_TTL`]] when absent or obviously wrong.
pub async fn fetch_odoh_config(
client: &reqwest::Client,
configs_url: &str,
) -> Result<OdohTargetConfig> {
let resp = client.get(configs_url).send().await?.error_for_status()?;
let ttl = cache_control_ttl(resp.headers()).unwrap_or(DEFAULT_CONFIG_TTL);
let body = resp.bytes().await?;
parse_odoh_config(&body, ttl)
}
fn parse_odoh_config(body: &[u8], ttl: Duration) -> Result<OdohTargetConfig> {
let mut buf = body;
let configs: ObliviousDoHConfigs = odoh_rs::parse(&mut buf)
.map_err(|e| format!("failed to parse ObliviousDoHConfigs: {e}"))?;
let first = configs
.into_iter()
.next()
.ok_or("target published no ODoH configs with a supported version + ciphersuite")?;
let contents: ObliviousDoHConfigContents = first.into();
let key_id = contents
.identifier()
.map_err(|e| format!("failed to derive key_id from ODoH config: {e}"))?;
Ok(OdohTargetConfig {
contents,
key_id,
expires_at: Instant::now() + ttl.min(MAX_CONFIG_TTL),
})
}
/// Send a DNS wire query through an ODoH relay to a target and return the
/// plaintext DNS wire response.
///
/// Flow: fetch the target's HPKE config (cached), seal the query, POST to the
/// relay with `Targethost`/`Targetpath` headers, then unseal the response.
/// On seal/unseal failure we invalidate the cache and retry once — this
/// handles the benign race where the target rotated its key between our
/// cached config and the POST.
pub async fn query_through_relay(
wire: &[u8],
relay_url: &str,
target_path: &str,
client: &reqwest::Client,
cache: &OdohConfigCache,
timeout_duration: Duration,
) -> Result<Vec<u8>> {
let req = OdohRequest {
wire,
relay_url,
target_path,
client,
cache,
timeout: timeout_duration,
};
match attempt_query(&req).await {
Ok(v) => Ok(v),
Err(AttemptError::KeyRotation(_)) => {
cache.invalidate();
attempt_query(&req).await.map_err(AttemptError::into_error)
}
Err(e) => Err(e.into_error()),
}
}
struct OdohRequest<'a> {
wire: &'a [u8],
relay_url: &'a str,
target_path: &'a str,
client: &'a reqwest::Client,
cache: &'a OdohConfigCache,
timeout: Duration,
}
/// Classification used only by the retry path in [`query_through_relay`].
enum AttemptError {
/// Target signalled the config we used is stale (key rotation race).
/// Callers should invalidate the cache and retry exactly once.
KeyRotation(String),
/// Any other failure — transport, timeout, malformed response.
Other(crate::Error),
}
impl AttemptError {
fn into_error(self) -> crate::Error {
match self {
AttemptError::KeyRotation(m) => format!("ODoH key rotation race: {m}").into(),
AttemptError::Other(e) => e,
}
}
}
async fn attempt_query(req: &OdohRequest<'_>) -> std::result::Result<Vec<u8>, AttemptError> {
let cfg = req.cache.get().await.map_err(AttemptError::Other)?;
let plaintext = ObliviousDoHMessagePlaintext::new(req.wire, 0);
// rand_core 0.9's OsRng is fallible-only; wrap for the infallible bound.
let mut os = OsRng;
let mut rng = os.unwrap_mut();
let (encrypted_query, client_secret) =
odoh_rs::encrypt_query(&plaintext, &cfg.contents, &mut rng)
.map_err(|e| AttemptError::Other(format!("ODoH encrypt failed: {e}").into()))?;
let body = odoh_rs::compose(&encrypted_query)
.map_err(|e| AttemptError::Other(format!("ODoH compose failed: {e}").into()))?
.freeze();
// RFC 9230 §5 and the reference client use URL query parameters, not
// HTTP headers, to carry the target routing. `Targethost`/`Targetpath`
// headers cause relays to treat the request as an unspecified-target and
// reject it.
let (status, resp_body) = timeout(req.timeout, async {
let resp = req
.client
.post(req.relay_url)
.header(reqwest::header::CONTENT_TYPE, ODOH_CONTENT_TYPE)
.header(reqwest::header::ACCEPT, ODOH_CONTENT_TYPE)
.header(reqwest::header::CACHE_CONTROL, "no-cache, no-store")
.query(&[
("targethost", req.cache.target_host()),
("targetpath", req.target_path),
])
.body(body)
.send()
.await?;
let status = resp.status();
let body = resp.bytes().await?;
Ok::<_, reqwest::Error>((status, body))
})
.await
.map_err(|_| AttemptError::Other("ODoH relay request timed out".into()))?
.map_err(|e| AttemptError::Other(format!("ODoH relay request failed: {e}").into()))?;
// RFC 9230 §4.3 expects a target that can't decrypt to reply with a DNS
// error in a sealed 200 response; a 401 from the relay/target is the
// practical signal that our cached HPKE key is stale. Treat 400 as a
// client-side bug (malformed ODoH envelope) — retrying would loop-fail.
if !status.is_success() {
let preview_len = resp_body.len().min(ERROR_BODY_PREVIEW_BYTES);
let body_preview = String::from_utf8_lossy(&resp_body[..preview_len]);
let msg = format!("ODoH relay returned {status}: {}", body_preview.trim());
return Err(if status.as_u16() == 401 {
AttemptError::KeyRotation(msg)
} else {
AttemptError::Other(msg.into())
});
}
let mut buf = resp_body;
let encrypted_response: ObliviousDoHMessage = odoh_rs::parse(&mut buf)
.map_err(|e| AttemptError::Other(format!("ODoH response parse failed: {e}").into()))?;
let plaintext_response =
odoh_rs::decrypt_response(&plaintext, &encrypted_response, client_secret)
.map_err(|e| AttemptError::KeyRotation(format!("ODoH decrypt failed: {e}")))?;
Ok(plaintext_response.into_msg().to_vec())
}
fn cache_control_ttl(headers: &HeaderMap) -> Option<Duration> {
let cc = headers.get(reqwest::header::CACHE_CONTROL)?.to_str().ok()?;
for directive in cc.split(',') {
let directive = directive.trim();
if let Some(rest) = directive.strip_prefix("max-age=") {
if let Ok(secs) = rest.trim().parse::<u64>() {
if secs > 0 {
return Some(Duration::from_secs(secs));
}
}
}
}
None
}
#[cfg(test)]
mod tests {
use super::*;
use odoh_rs::{ObliviousDoHConfig, ObliviousDoHKeyPair};
// RFC 9180 HPKE IDs for the sole ODoH mandatory suite:
// KEM = X25519, KDF = HKDF-SHA256, AEAD = AES-128-GCM.
const KEM_X25519: u16 = 0x0020;
const KDF_SHA256: u16 = 0x0001;
const AEAD_AES128GCM: u16 = 0x0001;
fn synth_configs_bytes() -> Vec<u8> {
let kp = ObliviousDoHKeyPair::from_parameters(
KEM_X25519,
KDF_SHA256,
AEAD_AES128GCM,
&[0u8; 32],
);
let pk = kp.public().clone();
let configs: ObliviousDoHConfigs = vec![ObliviousDoHConfig::from(pk)].into();
odoh_rs::compose(&configs).unwrap().to_vec()
}
#[test]
fn parse_accepts_well_formed_config() {
let bytes = synth_configs_bytes();
let cfg = parse_odoh_config(&bytes, Duration::from_secs(3600)).unwrap();
assert!(!cfg.key_id.is_empty());
assert!(!cfg.is_expired());
}
#[test]
fn parse_rejects_garbage() {
let bytes = [0xffu8; 16];
assert!(parse_odoh_config(&bytes, Duration::from_secs(3600)).is_err());
}
#[test]
fn parse_rejects_empty() {
assert!(parse_odoh_config(&[], Duration::from_secs(3600)).is_err());
}
#[test]
fn ttl_capped_at_max() {
let bytes = synth_configs_bytes();
let cfg = parse_odoh_config(&bytes, Duration::from_secs(100 * 24 * 60 * 60)).unwrap();
let remaining = cfg.expires_at.saturating_duration_since(Instant::now());
assert!(remaining <= MAX_CONFIG_TTL);
assert!(remaining >= MAX_CONFIG_TTL - Duration::from_secs(1));
}
#[test]
fn cache_control_parses_max_age() {
let mut h = HeaderMap::new();
h.insert("cache-control", "public, max-age=86400".parse().unwrap());
assert_eq!(cache_control_ttl(&h), Some(Duration::from_secs(86400)));
}
#[test]
fn cache_control_ignores_max_age_zero() {
let mut h = HeaderMap::new();
h.insert("cache-control", "max-age=0, no-store".parse().unwrap());
assert_eq!(cache_control_ttl(&h), None);
}
#[test]
fn cache_control_missing_falls_back() {
let h = HeaderMap::new();
assert_eq!(cache_control_ttl(&h), None);
}
#[test]
fn is_expired_tracks_ttl() {
let bytes = synth_configs_bytes();
let mut cfg = parse_odoh_config(&bytes, Duration::from_secs(3600)).unwrap();
assert!(!cfg.is_expired());
cfg.expires_at = Instant::now() - Duration::from_secs(1);
assert!(cfg.is_expired());
}
#[tokio::test]
async fn cache_backoff_blocks_refetch_after_failure() {
// Point the cache at a host that does not exist so the fetch fails
// deterministically; this exercises the backoff wiring without a
// network round-trip succeeding.
let cache = OdohConfigCache::new(
"odoh-target.invalid".to_string(),
reqwest::Client::builder()
.timeout(Duration::from_millis(200))
.build()
.unwrap(),
);
let first = cache.get().await;
assert!(first.is_err(), "first fetch must fail against invalid host");
// Within the backoff window, the cached error is returned immediately.
let second = cache.get().await.unwrap_err().to_string();
assert!(
second.contains("backoff active"),
"expected backoff hint, got: {second}"
);
// Reaching past the backoff window allows a fresh attempt — simulate
// by rewinding the recorded failure timestamp.
cache.last_failure.store(Some(Arc::new(FailedRefresh {
at: Instant::now() - (REFRESH_BACKOFF + Duration::from_secs(1)),
err: "prior".to_string(),
})));
let third = cache.get().await.unwrap_err().to_string();
assert!(
!third.contains("backoff active"),
"expected fresh fetch attempt, got: {third}"
);
}
/// Round-trip the HPKE seal/unseal path in isolation from HTTP, using the
/// odoh-rs primitives that `query_through_relay` wires together. Guards
/// against silently breaking the crypto glue if we refactor that path.
#[test]
fn seal_unseal_round_trip() {
use odoh_rs::{decrypt_query, encrypt_response, ResponseNonce};
let kp = ObliviousDoHKeyPair::from_parameters(
KEM_X25519,
KDF_SHA256,
AEAD_AES128GCM,
&[0u8; 32],
);
let query_wire = b"\x12\x34\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x07example\x03com\x00\x00\x01\x00\x01";
let query_pt = ObliviousDoHMessagePlaintext::new(query_wire, 0);
let mut os = OsRng;
let mut rng = os.unwrap_mut();
let (query_enc, client_secret) =
odoh_rs::encrypt_query(&query_pt, kp.public(), &mut rng).unwrap();
let (query_back, server_secret) = decrypt_query(&query_enc, &kp).unwrap();
assert_eq!(query_back.into_msg().as_ref(), query_wire);
let response_wire = b"\x12\x34\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00";
let response_pt = ObliviousDoHMessagePlaintext::new(response_wire, 0);
let response_enc = encrypt_response(
&query_pt,
&response_pt,
server_secret,
ResponseNonce::default(),
)
.unwrap();
let response_back =
odoh_rs::decrypt_response(&query_pt, &response_enc, client_secret).unwrap();
assert_eq!(response_back.into_msg().as_ref(), response_wire);
}
}

View File

@@ -85,6 +85,14 @@ impl DnsPacket {
+ self.edns.as_ref().map_or(0, |e| e.options.capacity())
}
/// Apply `f` to every record in the three RR sections (answers,
/// authorities, resources). Does not touch questions or edns.
pub fn for_each_record_mut(&mut self, mut f: impl FnMut(&mut DnsRecord)) {
self.answers.iter_mut().for_each(&mut f);
self.authorities.iter_mut().for_each(&mut f);
self.resources.iter_mut().for_each(&mut f);
}
pub fn response_from(query: &DnsPacket, rescode: crate::header::ResultCode) -> DnsPacket {
let mut resp = DnsPacket::new();
resp.header.id = query.header.id;

View File

@@ -4,7 +4,7 @@ use std::sync::Arc;
use axum::body::Body;
use axum::extract::{Request, State};
use axum::response::IntoResponse;
use axum::routing::any;
use axum::routing::{any, post};
use axum::Router;
use http_body_util::BodyExt;
use hyper::StatusCode;
@@ -18,6 +18,14 @@ use crate::ctx::ServerCtx;
type HttpClient = Client<hyper_util::client::legacy::connect::HttpConnector, Body>;
/// State passed to the DoH handler. Includes the remote address so
/// `resolve_query` can log the client IP.
#[derive(Clone)]
pub struct DohState {
pub ctx: Arc<ServerCtx>,
pub remote_addr: Option<std::net::SocketAddr>,
}
#[derive(Clone)]
struct ProxyState {
ctx: Arc<ServerCtx>,
@@ -74,9 +82,17 @@ pub async fn start_proxy_tls(ctx: Arc<ServerCtx>, port: u16, bind_addr: Ipv4Addr
// Hold a separate Arc so we can access tls_config after ctx moves into ProxyState
let tls_holder = Arc::clone(&ctx);
let state = ProxyState { ctx, client };
let proxy_state = ProxyState {
ctx: Arc::clone(&ctx),
client,
};
let app = Router::new().fallback(any(proxy_handler)).with_state(state);
// DoH route (RFC 8484) served only on the TLS listener.
// DohState.remote_addr is set per-connection below.
let doh_state = DohState {
ctx,
remote_addr: None,
};
loop {
let (tcp_stream, remote_addr) = match listener.accept().await {
@@ -91,7 +107,17 @@ pub async fn start_proxy_tls(ctx: Arc<ServerCtx>, port: u16, bind_addr: Ipv4Addr
// unwrap safe: guarded by is_none() check above
let acceptor =
TlsAcceptor::from(Arc::clone(&*tls_holder.tls_config.as_ref().unwrap().load()));
let app = app.clone();
let mut conn_doh_state = doh_state.clone();
conn_doh_state.remote_addr = Some(remote_addr);
let app = Router::new()
.route(
"/dns-query",
post(crate::doh::doh_post).with_state(conn_doh_state),
)
.fallback(any(proxy_handler))
.with_state(proxy_state.clone());
tokio::spawn(async move {
let tls_stream = match acceptor.accept(tcp_stream).await {
@@ -232,7 +258,7 @@ pre .str {{ color: #d48a5a }}
)
}
fn extract_host(req: &Request) -> Option<String> {
pub fn extract_host(req: &Request) -> Option<String> {
req.headers()
.get(hyper::header::HOST)
.and_then(|v| v.to_str().ok())

View File

@@ -5,7 +5,7 @@ use std::time::SystemTime;
use crate::cache::DnssecStatus;
use crate::header::ResultCode;
use crate::question::QueryType;
use crate::stats::QueryPath;
use crate::stats::{QueryPath, Transport};
pub struct QueryLogEntry {
pub timestamp: SystemTime,
@@ -13,6 +13,7 @@ pub struct QueryLogEntry {
pub domain: String,
pub query_type: QueryType,
pub path: QueryPath,
pub transport: Transport,
pub rescode: ResultCode,
pub latency_us: u64,
pub dnssec: DnssecStatus,
@@ -107,6 +108,7 @@ mod tests {
domain: "example.com".into(),
query_type: QueryType::A,
path: QueryPath::Forwarded,
transport: Transport::Udp,
rescode: ResultCode::NOERROR,
latency_us: 500,
dnssec: DnssecStatus::Indeterminate,

View File

@@ -1,114 +1,66 @@
use crate::buffer::BytePacketBuffer;
use crate::Result;
#[derive(PartialEq, Eq, Debug, Clone, Hash, Copy)]
pub enum QueryType {
UNKNOWN(u16),
A, // 1
NS, // 2
CNAME, // 5
SOA, // 6
PTR, // 12
MX, // 15
TXT, // 16
AAAA, // 28
SRV, // 33
DS, // 43
RRSIG, // 46
NSEC, // 47
DNSKEY, // 48
NSEC3, // 50
OPT, // 41 (EDNS0 pseudo-type)
HTTPS, // 65
macro_rules! define_qtypes {
( $( $variant:ident = $num:literal, $str:literal ),* $(,)? ) => {
#[derive(PartialEq, Eq, Debug, Clone, Hash, Copy)]
pub enum QueryType {
UNKNOWN(u16),
$( $variant, )*
}
impl QueryType {
pub fn to_num(&self) -> u16 {
match *self {
QueryType::UNKNOWN(x) => x,
$( QueryType::$variant => $num, )*
}
}
pub fn from_num(num: u16) -> QueryType {
match num {
$( $num => QueryType::$variant, )*
_ => QueryType::UNKNOWN(num),
}
}
pub fn as_str(&self) -> &'static str {
match self {
QueryType::UNKNOWN(_) => "UNKNOWN",
$( QueryType::$variant => $str, )*
}
}
pub fn parse_str(s: &str) -> Option<QueryType> {
match s.to_ascii_uppercase().as_str() {
$( $str => Some(QueryType::$variant), )*
_ => None,
}
}
}
};
}
impl QueryType {
pub fn to_num(&self) -> u16 {
match *self {
QueryType::UNKNOWN(x) => x,
QueryType::A => 1,
QueryType::NS => 2,
QueryType::CNAME => 5,
QueryType::SOA => 6,
QueryType::PTR => 12,
QueryType::MX => 15,
QueryType::TXT => 16,
QueryType::AAAA => 28,
QueryType::SRV => 33,
QueryType::OPT => 41,
QueryType::DS => 43,
QueryType::RRSIG => 46,
QueryType::NSEC => 47,
QueryType::DNSKEY => 48,
QueryType::NSEC3 => 50,
QueryType::HTTPS => 65,
}
}
pub fn from_num(num: u16) -> QueryType {
match num {
1 => QueryType::A,
2 => QueryType::NS,
5 => QueryType::CNAME,
6 => QueryType::SOA,
12 => QueryType::PTR,
15 => QueryType::MX,
16 => QueryType::TXT,
28 => QueryType::AAAA,
33 => QueryType::SRV,
41 => QueryType::OPT,
43 => QueryType::DS,
46 => QueryType::RRSIG,
47 => QueryType::NSEC,
48 => QueryType::DNSKEY,
50 => QueryType::NSEC3,
65 => QueryType::HTTPS,
_ => QueryType::UNKNOWN(num),
}
}
pub fn as_str(&self) -> &'static str {
match self {
QueryType::A => "A",
QueryType::NS => "NS",
QueryType::CNAME => "CNAME",
QueryType::SOA => "SOA",
QueryType::PTR => "PTR",
QueryType::MX => "MX",
QueryType::TXT => "TXT",
QueryType::AAAA => "AAAA",
QueryType::SRV => "SRV",
QueryType::OPT => "OPT",
QueryType::DS => "DS",
QueryType::RRSIG => "RRSIG",
QueryType::NSEC => "NSEC",
QueryType::DNSKEY => "DNSKEY",
QueryType::NSEC3 => "NSEC3",
QueryType::HTTPS => "HTTPS",
QueryType::UNKNOWN(_) => "UNKNOWN",
}
}
pub fn parse_str(s: &str) -> Option<QueryType> {
match s.to_ascii_uppercase().as_str() {
"A" => Some(QueryType::A),
"NS" => Some(QueryType::NS),
"CNAME" => Some(QueryType::CNAME),
"SOA" => Some(QueryType::SOA),
"PTR" => Some(QueryType::PTR),
"MX" => Some(QueryType::MX),
"TXT" => Some(QueryType::TXT),
"AAAA" => Some(QueryType::AAAA),
"SRV" => Some(QueryType::SRV),
"DS" => Some(QueryType::DS),
"RRSIG" => Some(QueryType::RRSIG),
"DNSKEY" => Some(QueryType::DNSKEY),
"NSEC" => Some(QueryType::NSEC),
"NSEC3" => Some(QueryType::NSEC3),
"HTTPS" => Some(QueryType::HTTPS),
_ => None,
}
}
define_qtypes! {
A = 1, "A",
NS = 2, "NS",
CNAME = 5, "CNAME",
SOA = 6, "SOA",
PTR = 12, "PTR",
MX = 15, "MX",
TXT = 16, "TXT",
AAAA = 28, "AAAA",
LOC = 29, "LOC",
SRV = 33, "SRV",
NAPTR = 35, "NAPTR",
OPT = 41, "OPT",
DS = 43, "DS",
RRSIG = 46, "RRSIG",
NSEC = 47, "NSEC",
DNSKEY = 48, "DNSKEY",
NSEC3 = 50, "NSEC3",
SVCB = 64, "SVCB",
HTTPS = 65, "HTTPS",
}
#[derive(Debug, Clone, PartialEq, Eq)]

View File

@@ -15,8 +15,8 @@ use crate::srtt::SrttCache;
const MAX_REFERRAL_DEPTH: u8 = 10;
const MAX_CNAME_DEPTH: u8 = 8;
const NS_QUERY_TIMEOUT: Duration = Duration::from_millis(800);
const TCP_TIMEOUT: Duration = Duration::from_millis(1500);
const NS_QUERY_TIMEOUT: Duration = Duration::from_millis(400);
const TCP_TIMEOUT: Duration = Duration::from_millis(400);
const UDP_FAIL_THRESHOLD: u8 = 3;
static QUERY_ID: AtomicU16 = AtomicU16::new(1);
@@ -202,23 +202,24 @@ pub(crate) fn resolve_iterative<'a>(
let mut ns_idx = 0;
for _ in 0..MAX_REFERRAL_DEPTH {
let ns_addr = match ns_addrs.get(ns_idx) {
Some(addr) => *addr,
None => return Err("no nameserver available".into()),
};
if ns_idx >= ns_addrs.len() {
return Err("no nameserver available".into());
}
let (q_name, q_type) = minimize_query(qname, qtype, &current_zone);
debug!(
"recursive: querying {} for {:?} {} (zone: {}, depth {})",
ns_addr, q_type, q_name, current_zone, referral_depth
"recursive: querying {} (+ hedge) for {:?} {} (zone: {}, depth {})",
ns_addrs[ns_idx], q_type, q_name, current_zone, referral_depth
);
let response = match send_query(q_name, q_type, ns_addr, srtt).await {
let response = match send_query_hedged(q_name, q_type, &ns_addrs[ns_idx..], srtt).await
{
Ok(r) => r,
Err(e) => {
debug!("recursive: NS {} failed: {}", ns_addr, e);
ns_idx += 1;
debug!("recursive: NS query failed: {}", e);
let remaining = ns_addrs.len().saturating_sub(ns_idx);
ns_idx += remaining.min(2);
continue;
}
};
@@ -228,6 +229,9 @@ pub(crate) fn resolve_iterative<'a>(
{
if let Some(zone) = referral_zone(&response) {
current_zone = zone;
let mut cache_w = cache.write().unwrap();
cache_ns_delegation(&mut cache_w, &current_zone, &response);
drop(cache_w);
}
let mut all_ns = extract_ns_from_records(&response.answers);
if all_ns.is_empty() {
@@ -296,6 +300,7 @@ pub(crate) fn resolve_iterative<'a>(
{
let mut cache_w = cache.write().unwrap();
cache_ns_delegation(&mut cache_w, &current_zone, &response);
cache_ds_from_authority(&mut cache_w, &response);
}
let mut new_ns_addrs = resolve_ns_addrs_from_glue(&response, &ns_names, cache);
@@ -560,6 +565,23 @@ fn cache_ds_from_authority(cache: &mut DnsCache, response: &DnsPacket) {
}
}
/// Cache NS delegation records from a referral response so that
/// `find_closest_ns` can skip re-querying TLD servers on subsequent lookups.
fn cache_ns_delegation(cache: &mut DnsCache, zone: &str, response: &DnsPacket) {
let ns_records: Vec<_> = response
.authorities
.iter()
.filter(|r| matches!(r, DnsRecord::NS { .. }))
.cloned()
.collect();
if ns_records.is_empty() {
return;
}
let mut pkt = make_glue_packet();
pkt.answers = ns_records;
cache.insert(zone, QueryType::NS, &pkt);
}
fn make_glue_packet() -> DnsPacket {
let mut pkt = DnsPacket::new();
pkt.header.response = true;
@@ -587,6 +609,115 @@ async fn tcp_with_srtt(
}
}
/// Smart NS query: fire to two servers simultaneously when SRTT is unknown
/// (cold queries), or to the best server with SRTT-based hedge when known.
async fn send_query_hedged(
qname: &str,
qtype: QueryType,
servers: &[SocketAddr],
srtt: &RwLock<SrttCache>,
) -> crate::Result<DnsPacket> {
if servers.is_empty() {
return Err("no nameserver available".into());
}
if servers.len() == 1 {
return send_query(qname, qtype, servers[0], srtt).await;
}
let primary = servers[0];
let secondary = servers[1];
let primary_known = srtt.read().unwrap().is_known(primary.ip());
if !primary_known {
// Cold: fire both simultaneously, first response wins
debug!(
"recursive: parallel query to {} and {} for {:?} {}",
primary, secondary, qtype, qname
);
let fut_a = send_query(qname, qtype, primary, srtt);
let fut_b = send_query(qname, qtype, secondary, srtt);
tokio::pin!(fut_a);
tokio::pin!(fut_b);
// First Ok wins. If one errors, wait for the other.
let mut a_done = false;
let mut b_done = false;
let mut a_err: Option<crate::Error> = None;
let mut b_err: Option<crate::Error> = None;
loop {
tokio::select! {
r = &mut fut_a, if !a_done => {
match r {
Ok(resp) => return Ok(resp),
Err(e) => { a_done = true; a_err = Some(e); }
}
}
r = &mut fut_b, if !b_done => {
match r {
Ok(resp) => return Ok(resp),
Err(e) => { b_done = true; b_err = Some(e); }
}
}
}
match (a_err.take(), b_err.take()) {
(Some(e), Some(_)) => return Err(e),
(a, b) => {
a_err = a;
b_err = b;
}
}
}
} else {
// Warm: send to best, hedge after SRTT × 3 if slow
let hedge_ms = srtt.read().unwrap().get(primary.ip()) * 3;
let hedge_delay = Duration::from_millis(hedge_ms.max(50));
let fut_a = send_query(qname, qtype, primary, srtt);
tokio::pin!(fut_a);
let delay = tokio::time::sleep(hedge_delay);
tokio::pin!(delay);
tokio::select! {
r = &mut fut_a => return r,
_ = &mut delay => {}
}
debug!(
"recursive: hedging {} -> {} after {}ms for {:?} {}",
primary, secondary, hedge_ms, qtype, qname
);
let fut_b = send_query(qname, qtype, secondary, srtt);
tokio::pin!(fut_b);
// First Ok wins; if one errors, wait for the other.
let mut a_err: Option<crate::Error> = None;
let mut b_err: Option<crate::Error> = None;
loop {
tokio::select! {
r = &mut fut_a, if a_err.is_none() => {
match r {
Ok(resp) => return Ok(resp),
Err(e) => {
if b_err.is_some() { return Err(e); }
a_err = Some(e);
}
}
}
r = &mut fut_b, if b_err.is_none() => {
match r {
Ok(resp) => return Ok(resp),
Err(e) => {
if let Some(ae) = a_err.take() { return Err(ae); }
b_err = Some(e);
}
}
}
}
}
}
}
async fn send_query(
qname: &str,
qtype: QueryType,
@@ -634,9 +765,13 @@ async fn send_query(
"send_query: {} consecutive UDP failures — switching to TCP-first",
fails
);
// Now that UDP is disabled, retry this query via TCP
return tcp_with_srtt(&query, server, srtt, start).await;
}
debug!("send_query: UDP failed for {}: {}, trying TCP", server, e);
tcp_with_srtt(&query, server, srtt, start).await
// UDP works in general (priming succeeded) but this server timed out.
// Don't waste another 400ms on TCP — the server is unreachable.
srtt.write().unwrap().record_failure(server.ip());
Err(e)
}
}
}
@@ -678,6 +813,10 @@ mod tests {
use super::*;
use std::net::{Ipv4Addr, Ipv6Addr};
/// Tests that mutate the global UDP_DISABLED / UDP_FAILURES flags must hold
/// this lock to avoid racing with each other under `cargo test` parallelism.
static UDP_STATE_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(());
#[test]
fn extract_ns_from_authority() {
let mut pkt = DnsPacket::new();
@@ -916,10 +1055,11 @@ mod tests {
}
/// TCP-only server returns authoritative answer directly.
/// Verifies: UDP fails → TCP fallback → resolves.
/// Verifies: when UDP is disabled, TCP-first resolves.
#[tokio::test]
async fn tcp_fallback_resolves_when_udp_blocked() {
UDP_DISABLED.store(false, Ordering::Relaxed);
let _guard = UDP_STATE_LOCK.lock().unwrap();
UDP_DISABLED.store(true, Ordering::Relaxed);
UDP_FAILURES.store(0, Ordering::Release);
let server_addr = spawn_tcp_dns_server(|query| {
@@ -950,49 +1090,32 @@ mod tests {
}
}
/// Full iterative resolution through TCP-only mock: root referral → authoritative answer.
/// The mock plays both roles (returns referral for NS queries, answer for A queries).
/// TCP round-trip through mock: query → authoritative answer via forward_tcp.
/// Uses forward_tcp directly to avoid dependence on the global UDP_DISABLED flag
/// which is shared across concurrent tests.
#[tokio::test]
async fn tcp_only_iterative_resolution() {
UDP_DISABLED.store(true, Ordering::Release); // Skip UDP entirely for speed
let server_addr = spawn_tcp_dns_server(|query| {
let q = match query.questions.first() {
Some(q) => q,
None => return DnsPacket::response_from(query, ResultCode::SERVFAIL),
};
if q.qtype == QueryType::NS || q.name == "com" {
// Return referral — NS points back to ourselves (same IP, port 53 in glue
// won't work, but cache will have our address from root_hints)
let mut resp = DnsPacket::new();
resp.header.id = query.header.id;
resp.header.response = true;
resp.header.rescode = ResultCode::NOERROR;
resp.questions = query.questions.clone();
resp.authorities.push(DnsRecord::NS {
domain: "com".into(),
host: "ns1.com".into(),
ttl: 3600,
});
resp
} else {
// Return authoritative answer
let mut resp = DnsPacket::response_from(query, ResultCode::NOERROR);
resp.header.authoritative_answer = true;
resp.answers.push(DnsRecord::A {
domain: q.name.clone(),
addr: Ipv4Addr::new(10, 0, 0, 42),
ttl: 300,
});
resp
}
let mut resp = DnsPacket::response_from(query, ResultCode::NOERROR);
resp.header.authoritative_answer = true;
resp.answers.push(DnsRecord::A {
domain: q.name.clone(),
addr: Ipv4Addr::new(10, 0, 0, 42),
ttl: 300,
});
resp
})
.await;
let srtt = RwLock::new(SrttCache::new(true));
let result = send_query("hello.example.com", QueryType::A, server_addr, &srtt).await;
let resp = result.expect("TCP-only send_query should work");
let query = DnsPacket::query(0x1234, "hello.example.com", QueryType::A);
let resp = crate::forward::forward_tcp(&query, server_addr, TCP_TIMEOUT)
.await
.expect("TCP query should work");
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::new(10, 0, 0, 42)),
@@ -1002,7 +1125,8 @@ mod tests {
#[tokio::test]
async fn tcp_fallback_handles_nxdomain() {
UDP_DISABLED.store(false, Ordering::Relaxed);
let _guard = UDP_STATE_LOCK.lock().unwrap();
UDP_DISABLED.store(true, Ordering::Relaxed);
UDP_FAILURES.store(0, Ordering::Release);
let server_addr = spawn_tcp_dns_server(|query| {
@@ -1034,6 +1158,7 @@ mod tests {
#[tokio::test]
async fn udp_auto_disable_resets() {
let _guard = UDP_STATE_LOCK.lock().unwrap();
UDP_DISABLED.store(true, Ordering::Release);
UDP_FAILURES.store(5, Ordering::Relaxed);

342
src/relay.rs Normal file
View File

@@ -0,0 +1,342 @@
//! ODoH relay (RFC 9230 §5) — the forward-without-reading half of the
//! protocol. Runs `numa relay`; skips all resolver initialisation (no port
//! 53, no cache, no recursion, no dashboard). The relay never reads the
//! HPKE-sealed payload and keeps no per-request logs — only aggregate
//! counters.
use std::net::SocketAddr;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use std::time::Duration;
use axum::body::Bytes;
use axum::extract::{DefaultBodyLimit, Query, State};
use axum::http::{header, StatusCode};
use axum::response::{IntoResponse, Response};
use axum::routing::{get, post};
use axum::Router;
use log::{error, info};
use serde::Deserialize;
use tokio::net::TcpListener;
use crate::forward::build_https_client_with_pool;
use crate::odoh::ODOH_CONTENT_TYPE;
use crate::Result;
/// Cap on the opaque body we accept from a client. ODoH envelopes are
/// ~100300 bytes in practice; anything larger is malformed or hostile.
const MAX_BODY_BYTES: usize = 4 * 1024;
/// Cap on the body we read back from the target before streaming to client.
/// Slightly larger: target responses carry DNS answers plus HPKE overhead.
const MAX_TARGET_RESPONSE_BYTES: usize = 8 * 1024;
/// Covers the whole client-to-target round trip — not just `.send()` — so a
/// slow-drip target can't hang a worker indefinitely after headers arrive.
const TARGET_REQUEST_TIMEOUT: Duration = Duration::from_secs(5);
/// The relay hits many distinct target hosts on behalf of clients. A
/// per-host idle pool of 4 keeps warm TLS connections available for concurrent
/// fan-out without blowing up memory on a small VPS.
const RELAY_POOL_PER_HOST: usize = 4;
#[derive(Deserialize)]
struct RelayParams {
targethost: String,
targetpath: String,
}
struct RelayState {
client: reqwest::Client,
total_requests: AtomicU64,
forwarded_ok: AtomicU64,
forwarded_err: AtomicU64,
rejected_bad_request: AtomicU64,
}
impl RelayState {
fn new() -> Arc<Self> {
Arc::new(RelayState {
client: build_https_client_with_pool(RELAY_POOL_PER_HOST),
total_requests: AtomicU64::new(0),
forwarded_ok: AtomicU64::new(0),
forwarded_err: AtomicU64::new(0),
rejected_bad_request: AtomicU64::new(0),
})
}
}
/// `DefaultBodyLimit` overrides axum's 2 MiB default so hostile clients
/// can't force the relay to buffer multi-MB bodies before our own cap.
fn build_app(state: Arc<RelayState>) -> Router {
Router::new()
.route("/relay", post(handle_relay))
.layer(DefaultBodyLimit::max(MAX_BODY_BYTES))
.route("/health", get(handle_health))
.with_state(state)
}
pub async fn run(addr: SocketAddr) -> Result<()> {
let app = build_app(RelayState::new());
let listener = TcpListener::bind(addr).await?;
info!("ODoH relay listening on {}", addr);
axum::serve(listener, app).await?;
Ok(())
}
async fn handle_health(State(state): State<Arc<RelayState>>) -> impl IntoResponse {
let body = format!(
"ok\ntotal {}\nforwarded_ok {}\nforwarded_err {}\nrejected_bad_request {}\n",
state.total_requests.load(Ordering::Relaxed),
state.forwarded_ok.load(Ordering::Relaxed),
state.forwarded_err.load(Ordering::Relaxed),
state.rejected_bad_request.load(Ordering::Relaxed),
);
(
StatusCode::OK,
[(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
body,
)
}
async fn handle_relay(
State(state): State<Arc<RelayState>>,
Query(params): Query<RelayParams>,
headers: axum::http::HeaderMap,
body: Bytes,
) -> Response {
state.total_requests.fetch_add(1, Ordering::Relaxed);
if !content_type_matches(&headers, ODOH_CONTENT_TYPE) {
state.rejected_bad_request.fetch_add(1, Ordering::Relaxed);
return (
StatusCode::UNSUPPORTED_MEDIA_TYPE,
"expected application/oblivious-dns-message",
)
.into_response();
}
if body.len() > MAX_BODY_BYTES {
state.rejected_bad_request.fetch_add(1, Ordering::Relaxed);
return (StatusCode::PAYLOAD_TOO_LARGE, "body exceeds 4 KiB cap").into_response();
}
if !is_valid_hostname(&params.targethost) || !params.targetpath.starts_with('/') {
state.rejected_bad_request.fetch_add(1, Ordering::Relaxed);
return (StatusCode::BAD_REQUEST, "invalid targethost or targetpath").into_response();
}
let target_url = format!("https://{}{}", params.targethost, params.targetpath);
match forward_to_target(&state.client, &target_url, body).await {
Ok((status, resp_body)) => {
state.forwarded_ok.fetch_add(1, Ordering::Relaxed);
(
status,
[(header::CONTENT_TYPE, ODOH_CONTENT_TYPE)],
resp_body,
)
.into_response()
}
Err(e) => {
// Log the underlying reason for operators; don't leak reqwest
// internals (which can reveal the target's TLS config, IP, etc.)
// back to arbitrary clients.
error!("relay forward to {} failed: {}", target_url, e);
state.forwarded_err.fetch_add(1, Ordering::Relaxed);
(StatusCode::BAD_GATEWAY, "target unreachable").into_response()
}
}
}
async fn forward_to_target(
client: &reqwest::Client,
url: &str,
body: Bytes,
) -> Result<(StatusCode, Bytes)> {
let response = tokio::time::timeout(TARGET_REQUEST_TIMEOUT, async {
let resp = client
.post(url)
.header(header::CONTENT_TYPE, ODOH_CONTENT_TYPE)
.header(header::ACCEPT, ODOH_CONTENT_TYPE)
.body(body)
.send()
.await?;
let status = StatusCode::from_u16(resp.status().as_u16())?;
let resp_body = resp.bytes().await?;
Ok::<_, crate::Error>((status, resp_body))
})
.await
.map_err(|_| "timed out talking to target")??;
if response.1.len() > MAX_TARGET_RESPONSE_BYTES {
return Err("target response exceeds cap".into());
}
Ok(response)
}
fn content_type_matches(headers: &axum::http::HeaderMap, expected: &str) -> bool {
headers
.get(header::CONTENT_TYPE)
.and_then(|v| v.to_str().ok())
.map(|ct| ct.split(';').next().unwrap_or("").trim() == expected)
.unwrap_or(false)
}
/// Strict DNS-hostname validator, aimed at closing the SSRF surface a naive
/// `contains('.')` check leaves open (e.g. `example.com@internal.host`,
/// `evil.com/../admin`). Requires ASCII letters/digits/dot/dash, at least
/// one dot, no leading dot or dash, length ≤ 253 per RFC 1035.
fn is_valid_hostname(h: &str) -> bool {
if h.is_empty() || h.len() > 253 || !h.contains('.') {
return false;
}
if h.starts_with('.') || h.starts_with('-') || h.ends_with('.') || h.ends_with('-') {
return false;
}
h.chars()
.all(|c| c.is_ascii_alphanumeric() || c == '.' || c == '-')
}
#[cfg(test)]
mod tests {
use super::*;
async fn spawn_relay() -> (SocketAddr, Arc<RelayState>) {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
let state = RelayState::new();
let app = build_app(state.clone());
tokio::spawn(async move {
let _ = axum::serve(listener, app).await;
});
(addr, state)
}
#[tokio::test]
async fn rejects_missing_content_type() {
let (addr, state) = spawn_relay().await;
let client = reqwest::Client::new();
let resp = client
.post(format!(
"http://{}/relay?targethost=odoh.example.com&targetpath=/dns-query",
addr
))
.body("body")
.send()
.await
.unwrap();
assert_eq!(resp.status(), reqwest::StatusCode::UNSUPPORTED_MEDIA_TYPE);
assert_eq!(state.rejected_bad_request.load(Ordering::Relaxed), 1);
}
#[tokio::test]
async fn rejects_oversized_body() {
let (addr, _state) = spawn_relay().await;
let big = vec![0u8; MAX_BODY_BYTES + 1];
let client = reqwest::Client::new();
let resp = client
.post(format!(
"http://{}/relay?targethost=odoh.example.com&targetpath=/dns-query",
addr
))
.header(header::CONTENT_TYPE, ODOH_CONTENT_TYPE)
.body(big)
.send()
.await
.unwrap();
// axum's DefaultBodyLimit rejects before our handler runs, so the
// counter doesn't increment — but the status code proves the layer
// enforced the cap. Either status is acceptable evidence.
assert!(matches!(
resp.status(),
reqwest::StatusCode::PAYLOAD_TOO_LARGE | reqwest::StatusCode::BAD_REQUEST
));
}
#[tokio::test]
async fn rejects_targethost_without_dot() {
let (addr, state) = spawn_relay().await;
let client = reqwest::Client::new();
let resp = client
.post(format!(
"http://{}/relay?targethost=localhost&targetpath=/dns-query",
addr
))
.header(header::CONTENT_TYPE, ODOH_CONTENT_TYPE)
.body("body")
.send()
.await
.unwrap();
assert_eq!(resp.status(), reqwest::StatusCode::BAD_REQUEST);
assert_eq!(state.rejected_bad_request.load(Ordering::Relaxed), 1);
}
#[tokio::test]
async fn rejects_userinfo_ssrf_attempt() {
let (addr, state) = spawn_relay().await;
let client = reqwest::Client::new();
// The naive contains('.') check would let this through and reqwest
// would route to `internal.host` using `evil.com` as userinfo.
let resp = client
.post(format!(
"http://{}/relay?targethost=evil.com@internal.host&targetpath=/dns-query",
addr
))
.header(header::CONTENT_TYPE, ODOH_CONTENT_TYPE)
.body("body")
.send()
.await
.unwrap();
assert_eq!(resp.status(), reqwest::StatusCode::BAD_REQUEST);
assert_eq!(state.rejected_bad_request.load(Ordering::Relaxed), 1);
}
#[tokio::test]
async fn rejects_targetpath_without_leading_slash() {
let (addr, state) = spawn_relay().await;
let client = reqwest::Client::new();
let resp = client
.post(format!(
"http://{}/relay?targethost=odoh.example.com&targetpath=dns-query",
addr
))
.header(header::CONTENT_TYPE, ODOH_CONTENT_TYPE)
.body("body")
.send()
.await
.unwrap();
assert_eq!(resp.status(), reqwest::StatusCode::BAD_REQUEST);
assert_eq!(state.rejected_bad_request.load(Ordering::Relaxed), 1);
}
#[tokio::test]
async fn health_endpoint_reports_counters() {
let (addr, _state) = spawn_relay().await;
let client = reqwest::Client::new();
let resp = client
.get(format!("http://{}/health", addr))
.send()
.await
.unwrap();
assert_eq!(resp.status(), reqwest::StatusCode::OK);
let body = resp.text().await.unwrap();
assert!(body.contains("ok\n"));
assert!(body.contains("forwarded_ok 0"));
}
#[test]
fn hostname_validator_accepts_and_rejects() {
assert!(is_valid_hostname("odoh.cloudflare-dns.com"));
assert!(is_valid_hostname("a.b"));
assert!(!is_valid_hostname(""));
assert!(!is_valid_hostname("localhost"));
assert!(!is_valid_hostname(".leading.dot"));
assert!(!is_valid_hostname("trailing.dot."));
assert!(!is_valid_hostname("-leading.dash"));
assert!(!is_valid_hostname("evil.com@internal.host"));
assert!(!is_valid_hostname("evil.com/../admin"));
assert!(!is_valid_hostname(&"a".repeat(254)));
}
}

665
src/serve.rs Normal file
View File

@@ -0,0 +1,665 @@
//! The main DNS-server runtime.
//!
//! Extracted from `main.rs` so both the interactive CLI entry and the
//! Windows service dispatcher (`windows_service` module) can drive the
//! same startup/serve loop.
use std::net::SocketAddr;
use std::sync::{Arc, Mutex, RwLock};
use std::time::Duration;
use arc_swap::ArcSwap;
use log::{error, info};
use tokio::net::UdpSocket;
use crate::blocklist::{download_blocklists, parse_blocklist, BlocklistStore};
use crate::buffer::BytePacketBuffer;
use crate::cache::DnsCache;
use crate::config::{build_zone_map, load_config, ConfigLoad};
use crate::ctx::{handle_query, ServerCtx};
use crate::forward::{
build_https_client, build_odoh_client, parse_upstream_list, Upstream, UpstreamPool,
};
use crate::odoh::OdohConfigCache;
use crate::override_store::OverrideStore;
use crate::query_log::QueryLog;
use crate::service_store::ServiceStore;
use crate::stats::{ServerStats, Transport};
use crate::system_dns::discover_system_dns;
const QUAD9_IP: &str = "9.9.9.9";
const DOH_FALLBACK: &str = "https://9.9.9.9/dns-query";
/// Boot the DNS server and run until the UDP listener errors out.
pub async fn run(config_path: String) -> crate::Result<()> {
let ConfigLoad {
config,
path: resolved_config_path,
found: config_found,
} = load_config(&config_path)?;
// Discover system DNS in a single pass (upstream + forwarding rules)
let system_dns = discover_system_dns();
let root_hints = crate::recursive::parse_root_hints(&config.upstream.root_hints);
let recursive_pool = || {
let dummy = UpstreamPool::new(vec![Upstream::Udp("0.0.0.0:0".parse().unwrap())], vec![]);
(dummy, "recursive (root hints)".to_string())
};
let (resolved_mode, upstream_auto, pool, upstream_label) = match config.upstream.mode {
crate::config::UpstreamMode::Auto => {
info!("auto mode: probing recursive resolution...");
if crate::recursive::probe_recursive(&root_hints).await {
info!("recursive probe succeeded — self-sovereign mode");
let (pool, label) = recursive_pool();
(crate::config::UpstreamMode::Recursive, false, pool, label)
} else {
log::warn!("recursive probe failed — falling back to Quad9 DoH");
let client = build_https_client();
let url = DOH_FALLBACK.to_string();
let label = url.clone();
let pool = UpstreamPool::new(vec![Upstream::Doh { url, client }], vec![]);
(crate::config::UpstreamMode::Forward, false, pool, label)
}
}
crate::config::UpstreamMode::Recursive => {
let (pool, label) = recursive_pool();
(crate::config::UpstreamMode::Recursive, false, pool, label)
}
crate::config::UpstreamMode::Forward => {
let addrs = if config.upstream.address.is_empty() {
let detected = system_dns
.default_upstream
.or_else(crate::system_dns::detect_dhcp_dns)
.unwrap_or_else(|| {
info!("could not detect system DNS, falling back to Quad9 DoH");
DOH_FALLBACK.to_string()
});
vec![detected]
} else {
config.upstream.address.clone()
};
let primary = parse_upstream_list(&addrs, config.upstream.port)?;
let fallback = parse_upstream_list(&config.upstream.fallback, config.upstream.port)?;
let pool = UpstreamPool::new(primary, fallback);
let label = pool.label();
(
crate::config::UpstreamMode::Forward,
config.upstream.address.is_empty(),
pool,
label,
)
}
crate::config::UpstreamMode::Odoh => {
let odoh = config.upstream.odoh_upstream()?;
let client = build_odoh_client(&odoh);
let target_config = Arc::new(OdohConfigCache::new(
odoh.target_host.clone(),
client.clone(),
));
let primary = vec![Upstream::Odoh {
relay_url: odoh.relay_url,
target_path: odoh.target_path,
client,
target_config,
}];
let fallback = if odoh.strict {
Vec::new()
} else {
parse_upstream_list(&config.upstream.fallback, config.upstream.port)?
};
let pool = UpstreamPool::new(primary, fallback);
let label = pool.label();
(crate::config::UpstreamMode::Odoh, false, pool, label)
}
};
let api_port = config.server.api_port;
let mut blocklist = BlocklistStore::new();
for domain in &config.blocking.allowlist {
blocklist.add_to_allowlist(domain);
}
if !config.blocking.enabled {
blocklist.set_enabled(false);
}
// Build service store: config services + persisted user services
let mut service_store = ServiceStore::new();
service_store.insert_from_config("numa", config.server.api_port, Vec::new());
for svc in &config.services {
service_store.insert_from_config(&svc.name, svc.target_port, svc.routes.clone());
}
service_store.load_persisted();
for fwd in &config.forwarding {
for suffix in &fwd.suffix {
info!(
"forwarding .{} to {} (config rule)",
suffix,
fwd.upstream.join(", ")
);
}
}
let forwarding_rules =
crate::config::merge_forwarding_rules(&config.forwarding, system_dns.forwarding_rules)?;
// Resolve data_dir from config, falling back to the platform default.
// Used for TLS CA storage below and stored on ServerCtx for runtime use.
let resolved_data_dir = config
.server
.data_dir
.clone()
.unwrap_or_else(crate::data_dir);
// Build initial TLS config before ServerCtx (so ArcSwap is ready at construction)
let initial_tls = if config.proxy.enabled && config.proxy.tls_port > 0 {
let service_names = service_store.names();
match crate::tls::build_tls_config(
&config.proxy.tld,
&service_names,
Vec::new(),
&resolved_data_dir,
) {
Ok(tls_config) => Some(ArcSwap::from(tls_config)),
Err(e) => {
if let Some(advisory) = crate::tls::try_data_dir_advisory(&e, &resolved_data_dir) {
eprint!("{}", advisory);
} else {
log::warn!("TLS setup failed, HTTPS proxy disabled: {}", e);
}
None
}
}
} else {
None
};
let doh_enabled = initial_tls.is_some();
let health_meta = crate::health::HealthMeta::build(
&resolved_data_dir,
config.dot.enabled,
config.dot.port,
config.mobile.port,
config.dnssec.enabled,
resolved_mode == crate::config::UpstreamMode::Recursive,
config.lan.enabled,
config.blocking.enabled,
doh_enabled,
);
let ca_pem = std::fs::read_to_string(resolved_data_dir.join("ca.pem")).ok();
let socket = match UdpSocket::bind(&config.server.bind_addr).await {
Ok(s) => s,
Err(e) => {
if let Some(advisory) =
crate::system_dns::try_port53_advisory(&config.server.bind_addr, &e)
{
eprint!("{}", advisory);
std::process::exit(1);
}
return Err(e.into());
}
};
let ctx = Arc::new(ServerCtx {
socket,
zone_map: build_zone_map(&config.zones)?,
cache: RwLock::new(DnsCache::new(
config.cache.max_entries,
config.cache.min_ttl,
config.cache.max_ttl,
)),
refreshing: Mutex::new(std::collections::HashSet::new()),
stats: Mutex::new(ServerStats::new()),
overrides: RwLock::new(OverrideStore::new()),
blocklist: RwLock::new(blocklist),
query_log: Mutex::new(QueryLog::new(1000)),
services: Mutex::new(service_store),
lan_peers: Mutex::new(crate::lan::PeerStore::new(config.lan.peer_timeout_secs)),
forwarding_rules,
upstream_pool: Mutex::new(pool),
upstream_auto,
upstream_port: config.upstream.port,
lan_ip: Mutex::new(crate::lan::detect_lan_ip().unwrap_or(std::net::Ipv4Addr::LOCALHOST)),
timeout: Duration::from_millis(config.upstream.timeout_ms),
hedge_delay: resolved_mode.hedge_delay(config.upstream.hedge_ms),
proxy_tld_suffix: if config.proxy.tld.is_empty() {
String::new()
} else {
format!(".{}", config.proxy.tld)
},
proxy_tld: config.proxy.tld.clone(),
lan_enabled: config.lan.enabled,
config_path: resolved_config_path,
config_found,
config_dir: crate::config_dir(),
data_dir: resolved_data_dir,
tls_config: initial_tls,
upstream_mode: resolved_mode,
root_hints,
srtt: std::sync::RwLock::new(crate::srtt::SrttCache::new(config.upstream.srtt)),
inflight: std::sync::Mutex::new(std::collections::HashMap::new()),
dnssec_enabled: config.dnssec.enabled,
dnssec_strict: config.dnssec.strict,
health_meta,
ca_pem,
mobile_enabled: config.mobile.enabled,
mobile_port: config.mobile.port,
filter_aaaa: config.server.filter_aaaa,
});
let zone_count: usize = ctx.zone_map.values().map(|m| m.len()).sum();
// Build banner rows, then size the box to fit the longest value
let api_url = format!("http://localhost:{}", api_port);
let proxy_label = if config.proxy.enabled {
if config.proxy.tls_port > 0 {
Some(format!(
"http://:{} https://:{}",
config.proxy.port, config.proxy.tls_port
))
} else {
Some(format!(
"http://*.{} on :{}",
config.proxy.tld, config.proxy.port
))
}
} else {
None
};
let config_label = if ctx.config_found {
ctx.config_path.clone()
} else {
format!("{} (defaults)", ctx.config_path)
};
let data_label = ctx.data_dir.display().to_string();
let services_label = ctx.config_dir.join("services.json").display().to_string();
// label (10) + value + padding (2) = inner width; minimum 40 for the title row
let val_w = [
config.server.bind_addr.len(),
api_url.len(),
upstream_label.len(),
config_label.len(),
data_label.len(),
services_label.len(),
]
.into_iter()
.chain(proxy_label.as_ref().map(|s| s.len()))
.max()
.unwrap_or(30);
let w = (val_w + 12).max(42); // 10 label + 2 padding, min 42 for title
let o = "\x1b[38;2;192;98;58m"; // orange
let g = "\x1b[38;2;107;124;78m"; // green
let d = "\x1b[38;2;163;152;136m"; // dim
let r = "\x1b[0m"; // reset
let b = "\x1b[1;38;2;192;98;58m"; // bold orange
let it = "\x1b[3;38;2;163;152;136m"; // italic dim
let bar_top = "".repeat(w);
let bar_mid = "".repeat(w);
let row = |label: &str, color: &str, value: &str| {
eprintln!(
"{o}{r} {color}{:<9}{r} {:<vw$}{o}{r}",
label,
value,
vw = w - 12
);
};
// Title row: center within the box
let title = format!(
"{b}NUMA{r} {it}DNS that governs itself{r} {d}v{}{r}",
env!("CARGO_PKG_VERSION")
);
// The title contains ANSI codes; visible length is ~38 chars. Pad to fill the box.
let title_visible_len = 4 + 2 + 24 + 2 + 1 + env!("CARGO_PKG_VERSION").len() + 1;
let title_pad = w.saturating_sub(title_visible_len);
eprintln!("\n{o}{bar_top}{r}");
eprint!("{o}{r} {title}");
eprintln!("{}{o}{r}", " ".repeat(title_pad));
eprintln!("{o}{bar_top}{r}");
row("DNS", g, &config.server.bind_addr);
row("API", g, &api_url);
row("Dashboard", g, &api_url);
row(
"Upstream",
g,
if ctx.upstream_mode == crate::config::UpstreamMode::Recursive {
"recursive (root hints)"
} else {
&upstream_label
},
);
row("Zones", g, &format!("{} records", zone_count));
row(
"Cache",
g,
&format!("max {} entries", config.cache.max_entries),
);
if !config.cache.warm.is_empty() {
row("Warm", g, &format!("{} domains", config.cache.warm.len()));
}
row(
"Blocking",
g,
&if config.blocking.enabled {
format!("{} lists", config.blocking.lists.len())
} else {
"disabled".to_string()
},
);
if let Some(ref label) = proxy_label {
row("Proxy", g, label);
if config.proxy.bind_addr == "127.0.0.1" {
let y = "\x1b[38;2;204;176;59m"; // yellow
row(
"",
y,
&format!(
"⚠ proxy on 127.0.0.1 — .{} not LAN reachable",
config.proxy.tld
),
);
}
}
if config.dot.enabled {
row("DoT", g, &format!("tls://:{}", config.dot.port));
}
if doh_enabled {
row(
"DoH",
g,
&format!("https://:{}/dns-query", config.proxy.tls_port),
);
}
if config.lan.enabled {
row("LAN", g, "mDNS (_numa._tcp.local)");
}
if !ctx.forwarding_rules.is_empty() {
row(
"Routing",
g,
&format!("{} conditional rules", ctx.forwarding_rules.len()),
);
}
eprintln!("{o}{bar_mid}{r}");
row("Config", d, &config_label);
row("Data", d, &data_label);
row("Services", d, &services_label);
eprintln!("{o}{bar_top}{r}\n");
info!(
"numa listening on {}, upstream {}, {} zone records, cache max {}, API on port {}",
config.server.bind_addr, upstream_label, zone_count, config.cache.max_entries, api_port,
);
// Download blocklists on startup
let blocklist_lists = config.blocking.lists.clone();
let refresh_hours = config.blocking.refresh_hours;
if config.blocking.enabled && !blocklist_lists.is_empty() {
let bl_ctx = Arc::clone(&ctx);
let bl_lists = blocklist_lists.clone();
tokio::spawn(async move {
load_blocklists(&bl_ctx, &bl_lists).await;
// Periodic refresh
let mut interval = tokio::time::interval(Duration::from_secs(refresh_hours * 3600));
interval.tick().await; // skip immediate tick
loop {
interval.tick().await;
info!("refreshing blocklists...");
load_blocklists(&bl_ctx, &bl_lists).await;
}
});
}
// Prime TLD cache (recursive mode only)
if ctx.upstream_mode == crate::config::UpstreamMode::Recursive {
let prime_ctx = Arc::clone(&ctx);
let prime_tlds = config.upstream.prime_tlds;
tokio::spawn(async move {
crate::recursive::prime_tld_cache(
&prime_ctx.cache,
&prime_ctx.root_hints,
&prime_tlds,
&prime_ctx.srtt,
)
.await;
});
}
// Spawn cache warming for user-configured domains
if !config.cache.warm.is_empty() {
let warm_ctx = Arc::clone(&ctx);
let warm_domains = config.cache.warm.clone();
tokio::spawn(async move {
cache_warm_loop(warm_ctx, warm_domains).await;
});
}
// Spawn DoH connection keepalive — prevents idle TLS teardown
{
let keepalive_ctx = Arc::clone(&ctx);
tokio::spawn(async move {
doh_keepalive_loop(keepalive_ctx).await;
});
}
// Spawn HTTP API server
let api_ctx = Arc::clone(&ctx);
let api_addr: SocketAddr = format!("{}:{}", config.server.api_bind_addr, api_port).parse()?;
tokio::spawn(async move {
let app = crate::api::router(api_ctx);
let listener = tokio::net::TcpListener::bind(api_addr).await.unwrap();
info!("HTTP API listening on {}", api_addr);
axum::serve(listener, app).await.unwrap();
});
// Spawn Mobile API listener (read-only subset for iOS/Android companion
// apps, LAN-bound by default so phones can reach it). Only idempotent
// GETs; no state-mutating routes are exposed here regardless of
// the main API's bind address.
if config.mobile.enabled {
let mobile_ctx = Arc::clone(&ctx);
let mobile_bind = config.mobile.bind_addr.clone();
let mobile_port = config.mobile.port;
tokio::spawn(async move {
if let Err(e) = crate::mobile_api::start(mobile_ctx, mobile_bind, mobile_port).await {
log::warn!("Mobile API listener failed: {}", e);
}
});
}
let proxy_bind: std::net::Ipv4Addr = config
.proxy
.bind_addr
.parse()
.unwrap_or(std::net::Ipv4Addr::LOCALHOST);
// Spawn HTTP reverse proxy for .numa domains
if config.proxy.enabled {
let proxy_ctx = Arc::clone(&ctx);
let proxy_port = config.proxy.port;
tokio::spawn(async move {
crate::proxy::start_proxy(proxy_ctx, proxy_port, proxy_bind).await;
});
}
// Spawn HTTPS reverse proxy with TLS termination
if config.proxy.enabled && config.proxy.tls_port > 0 && ctx.tls_config.is_some() {
let proxy_ctx = Arc::clone(&ctx);
let tls_port = config.proxy.tls_port;
tokio::spawn(async move {
crate::proxy::start_proxy_tls(proxy_ctx, tls_port, proxy_bind).await;
});
}
// Spawn network change watcher (upstream re-detection, LAN IP update, peer flush)
{
let watch_ctx = Arc::clone(&ctx);
tokio::spawn(async move {
network_watch_loop(watch_ctx).await;
});
}
// Spawn LAN service discovery
if config.lan.enabled {
let lan_ctx = Arc::clone(&ctx);
let lan_config = config.lan.clone();
tokio::spawn(async move {
crate::lan::start_lan_discovery(lan_ctx, &lan_config).await;
});
}
// Spawn DNS-over-TLS listener (RFC 7858)
if config.dot.enabled {
let dot_ctx = Arc::clone(&ctx);
let dot_config = config.dot.clone();
tokio::spawn(async move {
crate::dot::start_dot(dot_ctx, &dot_config).await;
});
}
// UDP DNS listener
#[allow(clippy::infinite_loop)]
loop {
let mut buffer = BytePacketBuffer::new();
let (len, src_addr) = match ctx.socket.recv_from(&mut buffer.buf).await {
Ok(r) => r,
Err(e) if e.kind() == std::io::ErrorKind::ConnectionReset => {
// Windows delivers ICMP port-unreachable as ConnectionReset on UDP sockets
continue;
}
Err(e) => return Err(e.into()),
};
let ctx = Arc::clone(&ctx);
tokio::spawn(async move {
if let Err(e) = handle_query(buffer, len, src_addr, &ctx, Transport::Udp).await {
error!("{} | HANDLER ERROR | {}", src_addr, e);
}
});
}
}
async fn network_watch_loop(ctx: Arc<ServerCtx>) {
let mut tick: u64 = 0;
let mut interval = tokio::time::interval(Duration::from_secs(5));
interval.tick().await; // skip immediate tick
loop {
interval.tick().await;
tick += 1;
let mut changed = false;
// Check LAN IP change (every 5s — cheap, one UDP socket call)
if let Some(new_ip) = crate::lan::detect_lan_ip() {
let mut current_ip = ctx.lan_ip.lock().unwrap();
if new_ip != *current_ip {
info!("LAN IP changed: {} → {}", current_ip, new_ip);
*current_ip = new_ip;
changed = true;
crate::recursive::reset_udp_state();
}
}
// Re-detect upstream every 30s or on LAN IP change (auto-detect only)
if ctx.upstream_auto && (changed || tick.is_multiple_of(6)) {
let dns_info = crate::system_dns::discover_system_dns();
let new_addr = dns_info
.default_upstream
.or_else(crate::system_dns::detect_dhcp_dns)
.unwrap_or_else(|| QUAD9_IP.to_string());
let mut pool = ctx.upstream_pool.lock().unwrap();
if pool.maybe_update_primary(&new_addr, ctx.upstream_port) {
info!("upstream changed → {}", pool.label());
changed = true;
}
}
// Flush stale LAN peers on any network change
if changed {
ctx.lan_peers.lock().unwrap().clear();
info!("flushed LAN peers after network change");
}
// Re-probe UDP every 5 minutes when disabled
if tick.is_multiple_of(60) {
crate::recursive::probe_udp(&ctx.root_hints).await;
}
}
}
async fn load_blocklists(ctx: &ServerCtx, lists: &[String]) {
let downloaded = download_blocklists(lists).await;
// Parse outside the lock to avoid blocking DNS queries during parse (~100ms)
let mut all_domains = std::collections::HashSet::new();
let mut sources = Vec::new();
for (source, text) in &downloaded {
let domains = parse_blocklist(text);
info!("blocklist: {} domains from {}", domains.len(), source);
all_domains.extend(domains);
sources.push(source.clone());
}
let total = all_domains.len();
// Swap under lock — sub-microsecond
ctx.blocklist
.write()
.unwrap()
.swap_domains(all_domains, sources);
info!(
"blocking enabled: {} unique domains from {} lists",
total,
downloaded.len()
);
}
async fn warm_domain(ctx: &ServerCtx, domain: &str) {
for qtype in [
crate::question::QueryType::A,
crate::question::QueryType::AAAA,
] {
crate::ctx::refresh_entry(ctx, domain, qtype).await;
}
}
async fn doh_keepalive_loop(ctx: Arc<ServerCtx>) {
let mut interval = tokio::time::interval(Duration::from_secs(25));
interval.tick().await; // skip first immediate tick
loop {
interval.tick().await;
let pool = ctx.upstream_pool.lock().unwrap().clone();
if let Some(upstream) = pool.preferred() {
crate::forward::keepalive_doh(upstream).await;
}
}
}
async fn cache_warm_loop(ctx: Arc<ServerCtx>, domains: Vec<String>) {
tokio::time::sleep(Duration::from_secs(2)).await;
for domain in &domains {
warm_domain(&ctx, domain).await;
}
info!("cache warm: {} domains resolved at startup", domains.len());
let mut interval = tokio::time::interval(Duration::from_secs(30));
interval.tick().await;
loop {
interval.tick().await;
for domain in &domains {
let refresh = ctx.cache.read().unwrap().needs_warm(domain);
if refresh {
warm_domain(&ctx, domain).await;
}
}
}
}

126
src/setup_phone.rs Normal file
View File

@@ -0,0 +1,126 @@
//! `numa setup-phone` CLI — thin QR wrapper over the persistent mobile API.
//!
//! Before the mobile API existed, this command spawned its own one-shot
//! HTTP server on port 8765 to serve a freshly-generated mobileconfig
//! for a single download. That role now belongs to
//! [`crate::mobile_api`], which runs persistently alongside the main
//! API and serves `/mobileconfig` at the same port whenever Numa is
//! running.
//!
//! This command is now a thin terminal-side wrapper:
//!
//! 1. Detect the current LAN IP
//! 2. Render a terminal QR code pointing at
//! `http://<lan_ip>:8765/mobileconfig`
//! 3. Print install instructions and exit
//!
//! The user scans the QR, iOS fetches the profile from the mobile API
//! (which is always up as long as `numa` is running), installs, and the
//! user walks through Settings → Certificate Trust Settings to enable
//! trust.
//!
//! Numa must be running for the profile download to succeed; if the
//! mobile API is not listening on port 8765, the download will fail
//! and the user will see Safari's "Cannot Connect to Server" error.
//! The CLI prints a reminder about this at the bottom of the output.
use qrcode::render::unicode;
use qrcode::QrCode;
/// Default port where the persistent mobile API serves `/mobileconfig`.
/// Matches `MobileConfig::default().port` in `config.rs`. If the user
/// has overridden `[mobile] port = N` in `numa.toml`, they'll need to
/// adjust the URL manually — this CLI uses the default without parsing
/// `numa.toml`.
const SETUP_PORT: u16 = 8765;
fn render_qr(url: &str) -> Result<String, String> {
let code = QrCode::new(url).map_err(|e| format!("failed to encode QR: {}", e))?;
Ok(code
.render::<unicode::Dense1x2>()
.dark_color(unicode::Dense1x2::Light)
.light_color(unicode::Dense1x2::Dark)
.build())
}
/// Run the `numa setup-phone` flow.
pub async fn run() -> Result<(), String> {
let lan_ip = crate::lan::detect_lan_ip()
.ok_or("could not detect LAN IP — are you connected to a network?")?;
let addr = std::net::SocketAddr::from(([127, 0, 0, 1], SETUP_PORT));
let api_reachable = tokio::time::timeout(
std::time::Duration::from_millis(500),
tokio::net::TcpStream::connect(addr),
)
.await
.map(|r| r.is_ok())
.unwrap_or(false);
if !api_reachable {
eprintln!();
eprintln!(
" \x1b[1;38;2;192;98;58mNuma\x1b[0m — mobile API is not reachable on port {}.",
SETUP_PORT
);
eprintln!();
eprintln!(" The phone won't be able to download the profile until the mobile");
eprintln!(" API is running. Add this to your numa.toml and restart Numa:");
eprintln!();
eprintln!(" [mobile]");
eprintln!(" enabled = true");
eprintln!();
return Err("mobile API not running".into());
}
let url = format!("http://{}:{}/mobileconfig", lan_ip, SETUP_PORT);
let qr = render_qr(&url)?;
eprintln!();
eprintln!(" \x1b[1;38;2;192;98;58mNuma Phone Setup\x1b[0m");
eprintln!();
eprintln!(" Profile URL: \x1b[36m{}\x1b[0m", url);
eprintln!();
for line in qr.lines() {
eprintln!(" {}", line);
}
eprintln!();
eprintln!(" \x1b[1mOn your iPhone:\x1b[0m");
eprintln!(" 1. Open Camera, point at the QR code, tap the yellow banner");
eprintln!(" 2. Allow the download when Safari asks");
eprintln!(" 3. Open Settings — tap \"Profile Downloaded\" near the top");
eprintln!(" (or: Settings → General → VPN & Device Management → Numa DNS)");
eprintln!(" 4. Tap Install (top right), enter passcode, Install again");
eprintln!(" 5. \x1b[1mSettings → General → About → Certificate Trust Settings\x1b[0m");
eprintln!(" Toggle ON \"Numa Local CA\" — required for DoT to work");
eprintln!();
eprintln!(
" \x1b[33mNote:\x1b[0m profile uses your laptop's current IP ({}). If your",
lan_ip
);
eprintln!(" laptop changes networks, re-scan this QR — iOS will replace the");
eprintln!(" existing profile automatically (fixed UUID).");
eprintln!();
eprintln!(
" \x1b[90mThe profile is served by Numa's persistent mobile API on port {}.\x1b[0m",
SETUP_PORT
);
eprintln!(" \x1b[90mMake sure `numa` is running before scanning. If it's not,\x1b[0m");
eprintln!(" \x1b[90mstart it with `sudo numa install` or run it interactively.\x1b[0m");
eprintln!();
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn render_qr_produces_unicode() {
let qr = render_qr("http://192.168.1.9:8765/mobileconfig").unwrap();
assert!(!qr.is_empty());
// Dense1x2 uses these block characters
assert!(qr.chars().any(|c| matches!(c, '█' | '▀' | '▄' | ' ')));
}
}

View File

@@ -45,6 +45,11 @@ impl SrttCache {
}
}
/// Whether we have observed RTT data for this IP.
pub fn is_known(&self, ip: IpAddr) -> bool {
self.entries.contains_key(&ip)
}
/// Apply time-based decay: each DECAY_AFTER_SECS period halves distance to INITIAL.
fn decayed_srtt(entry: &SrttEntry) -> u64 {
Self::decay_for_age(entry.srtt_ms, entry.updated_at.elapsed().as_secs())

View File

@@ -90,6 +90,7 @@ fn linux_rss() -> usize {
pub struct ServerStats {
queries_total: u64,
queries_forwarded: u64,
queries_upstream: u64,
queries_recursive: u64,
queries_coalesced: u64,
queries_cached: u64,
@@ -97,14 +98,69 @@ pub struct ServerStats {
queries_local: u64,
queries_overridden: u64,
upstream_errors: u64,
transport_udp: u64,
transport_tcp: u64,
transport_dot: u64,
transport_doh: u64,
upstream_transport_udp: u64,
upstream_transport_doh: u64,
upstream_transport_dot: u64,
upstream_transport_odoh: u64,
started_at: Instant,
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum Transport {
Udp,
Tcp,
Dot,
Doh,
}
impl Transport {
pub fn as_str(&self) -> &'static str {
match self {
Transport::Udp => "UDP",
Transport::Tcp => "TCP",
Transport::Dot => "DOT",
Transport::Doh => "DOH",
}
}
}
/// Wire protocol used for a forwarded upstream call. Orthogonal to
/// `QueryPath`: the path answers "where the answer came from"; this answers
/// "over what wire we spoke to the forwarder." Callers pass
/// `Option<UpstreamTransport>` — `None` for resolutions that never touched
/// a forwarder (cache/local/blocked) or for recursive mode, which has its
/// own counter via `QueryPath::Recursive`.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum UpstreamTransport {
Udp,
Doh,
Dot,
Odoh,
}
impl UpstreamTransport {
pub fn as_str(&self) -> &'static str {
match self {
UpstreamTransport::Udp => "UDP",
UpstreamTransport::Doh => "DOH",
UpstreamTransport::Dot => "DOT",
UpstreamTransport::Odoh => "ODOH",
}
}
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum QueryPath {
Local,
Cached,
/// Matched a `[[forwarding]]` suffix rule.
Forwarded,
/// Resolved via the default `[upstream]` pool (no suffix match).
Upstream,
Recursive,
Coalesced,
Blocked,
@@ -118,6 +174,7 @@ impl QueryPath {
QueryPath::Local => "LOCAL",
QueryPath::Cached => "CACHED",
QueryPath::Forwarded => "FORWARD",
QueryPath::Upstream => "UPSTREAM",
QueryPath::Recursive => "RECURSIVE",
QueryPath::Coalesced => "COALESCED",
QueryPath::Blocked => "BLOCKED",
@@ -133,6 +190,8 @@ impl QueryPath {
Some(QueryPath::Cached)
} else if s.eq_ignore_ascii_case("FORWARD") {
Some(QueryPath::Forwarded)
} else if s.eq_ignore_ascii_case("UPSTREAM") {
Some(QueryPath::Upstream)
} else if s.eq_ignore_ascii_case("RECURSIVE") {
Some(QueryPath::Recursive)
} else if s.eq_ignore_ascii_case("COALESCED") {
@@ -160,6 +219,7 @@ impl ServerStats {
ServerStats {
queries_total: 0,
queries_forwarded: 0,
queries_upstream: 0,
queries_recursive: 0,
queries_coalesced: 0,
queries_cached: 0,
@@ -167,22 +227,50 @@ impl ServerStats {
queries_local: 0,
queries_overridden: 0,
upstream_errors: 0,
transport_udp: 0,
transport_tcp: 0,
transport_dot: 0,
transport_doh: 0,
upstream_transport_udp: 0,
upstream_transport_doh: 0,
upstream_transport_dot: 0,
upstream_transport_odoh: 0,
started_at: Instant::now(),
}
}
pub fn record(&mut self, path: QueryPath) -> u64 {
pub fn record(
&mut self,
path: QueryPath,
transport: Transport,
upstream_transport: Option<UpstreamTransport>,
) -> u64 {
self.queries_total += 1;
match path {
QueryPath::Local => self.queries_local += 1,
QueryPath::Cached => self.queries_cached += 1,
QueryPath::Forwarded => self.queries_forwarded += 1,
QueryPath::Upstream => self.queries_upstream += 1,
QueryPath::Recursive => self.queries_recursive += 1,
QueryPath::Coalesced => self.queries_coalesced += 1,
QueryPath::Blocked => self.queries_blocked += 1,
QueryPath::Overridden => self.queries_overridden += 1,
QueryPath::UpstreamError => self.upstream_errors += 1,
}
match transport {
Transport::Udp => self.transport_udp += 1,
Transport::Tcp => self.transport_tcp += 1,
Transport::Dot => self.transport_dot += 1,
Transport::Doh => self.transport_doh += 1,
}
if let Some(ut) = upstream_transport {
match ut {
UpstreamTransport::Udp => self.upstream_transport_udp += 1,
UpstreamTransport::Doh => self.upstream_transport_doh += 1,
UpstreamTransport::Dot => self.upstream_transport_dot += 1,
UpstreamTransport::Odoh => self.upstream_transport_odoh += 1,
}
}
self.queries_total
}
@@ -199,6 +287,7 @@ impl ServerStats {
uptime_secs: self.uptime_secs(),
total: self.queries_total,
forwarded: self.queries_forwarded,
upstream: self.queries_upstream,
recursive: self.queries_recursive,
coalesced: self.queries_coalesced,
cached: self.queries_cached,
@@ -206,6 +295,14 @@ impl ServerStats {
overridden: self.queries_overridden,
blocked: self.queries_blocked,
errors: self.upstream_errors,
transport_udp: self.transport_udp,
transport_tcp: self.transport_tcp,
transport_dot: self.transport_dot,
transport_doh: self.transport_doh,
upstream_transport_udp: self.upstream_transport_udp,
upstream_transport_doh: self.upstream_transport_doh,
upstream_transport_dot: self.upstream_transport_dot,
upstream_transport_odoh: self.upstream_transport_odoh,
}
}
@@ -216,10 +313,11 @@ impl ServerStats {
let secs = uptime.as_secs() % 60;
log::info!(
"STATS | uptime {}h{}m{}s | total {} | fwd {} | recursive {} | coalesced {} | cached {} | local {} | override {} | blocked {} | errors {}",
"STATS | uptime {}h{}m{}s | total {} | fwd {} | upstream {} | recursive {} | coalesced {} | cached {} | local {} | override {} | blocked {} | errors {} | up-udp {} | up-doh {} | up-dot {} | up-odoh {}",
hours, mins, secs,
self.queries_total,
self.queries_forwarded,
self.queries_upstream,
self.queries_recursive,
self.queries_coalesced,
self.queries_cached,
@@ -227,6 +325,10 @@ impl ServerStats {
self.queries_overridden,
self.queries_blocked,
self.upstream_errors,
self.upstream_transport_udp,
self.upstream_transport_doh,
self.upstream_transport_dot,
self.upstream_transport_odoh,
);
}
}
@@ -235,6 +337,7 @@ pub struct StatsSnapshot {
pub uptime_secs: u64,
pub total: u64,
pub forwarded: u64,
pub upstream: u64,
pub recursive: u64,
pub coalesced: u64,
pub cached: u64,
@@ -242,4 +345,12 @@ pub struct StatsSnapshot {
pub overridden: u64,
pub blocked: u64,
pub errors: u64,
pub transport_udp: u64,
pub transport_tcp: u64,
pub transport_dot: u64,
pub transport_doh: u64,
pub upstream_transport_udp: u64,
pub upstream_transport_doh: u64,
pub upstream_transport_dot: u64,
pub upstream_transport_odoh: u64,
}

179
src/svcb.rs Normal file
View File

@@ -0,0 +1,179 @@
//! Minimal SVCB/HTTPS (RFC 9460) RDATA parser — just enough to strip
//! the `ipv6hint` SvcParam. Used by the `filter_aaaa` feature so
//! HTTPS-record-aware clients (Chrome ≥103, Firefox, Safari) don't
//! receive v6 address hints on IPv4-only networks.
/// SvcParamKey = 6 (RFC 9460 §14.3.2).
const IPV6_HINT_KEY: u16 = 6;
/// Strip the `ipv6hint` SvcParam from an HTTPS/SVCB RDATA blob.
///
/// Returns `Some(new_rdata)` if `ipv6hint` was present and removed.
/// Returns `None` if the record had no `ipv6hint`, or if the RDATA
/// couldn't be parsed — in both cases the caller should keep the
/// original bytes untouched.
///
/// SVCB RDATA (RFC 9460 §2.2):
/// SvcPriority (u16)
/// TargetName (uncompressed DNS name — labels terminated by 0 octet)
/// SvcParams (series of {u16 key, u16 len, opaque[len] value}, sorted by key)
pub fn strip_ipv6hint(rdata: &[u8]) -> Option<Vec<u8>> {
if rdata.len() < 2 {
return None;
}
let mut pos = 2;
// TargetName — uncompressed per RFC 9460 §2.2
loop {
let len = *rdata.get(pos)? as usize;
pos += 1;
if len == 0 {
break;
}
if len & 0xC0 != 0 {
// Pointer: forbidden in SVCB but defend against a broken upstream.
return None;
}
pos = pos.checked_add(len)?;
if pos > rdata.len() {
return None;
}
}
// Scan params once to decide whether we need to rebuild.
let params_start = pos;
let mut scan = pos;
let mut has_ipv6hint = false;
while scan < rdata.len() {
if scan + 4 > rdata.len() {
return None;
}
let key = u16::from_be_bytes([rdata[scan], rdata[scan + 1]]);
let vlen = u16::from_be_bytes([rdata[scan + 2], rdata[scan + 3]]) as usize;
let end = scan.checked_add(4)?.checked_add(vlen)?;
if end > rdata.len() {
return None;
}
if key == IPV6_HINT_KEY {
has_ipv6hint = true;
}
scan = end;
}
if scan != rdata.len() || !has_ipv6hint {
return None;
}
// Rebuild without ipv6hint, preserving param order (RFC 9460 requires
// ascending key order, which we preserve by filtering in place).
let mut out = Vec::with_capacity(rdata.len());
out.extend_from_slice(&rdata[..params_start]);
let mut pos = params_start;
while pos < rdata.len() {
let key = u16::from_be_bytes([rdata[pos], rdata[pos + 1]]);
let vlen = u16::from_be_bytes([rdata[pos + 2], rdata[pos + 3]]) as usize;
let end = pos + 4 + vlen;
if key != IPV6_HINT_KEY {
out.extend_from_slice(&rdata[pos..end]);
}
pos = end;
}
Some(out)
}
/// Build an SVCB RDATA blob from a priority, target labels, and
/// (key, value) param pairs. Shared by `svcb` unit tests and `ctx`
/// pipeline tests that need to seed the cache with a synthetic HTTPS RR.
#[cfg(test)]
pub(crate) fn build_rdata(priority: u16, target: &[&str], params: &[(u16, Vec<u8>)]) -> Vec<u8> {
let mut out = Vec::new();
out.extend_from_slice(&priority.to_be_bytes());
for label in target {
out.push(label.len() as u8);
out.extend_from_slice(label.as_bytes());
}
out.push(0);
for (key, value) in params {
out.extend_from_slice(&key.to_be_bytes());
out.extend_from_slice(&(value.len() as u16).to_be_bytes());
out.extend_from_slice(value);
}
out
}
#[cfg(test)]
mod tests {
use super::*;
fn alpn_h3() -> (u16, Vec<u8>) {
// alpn = ["h3"]: one length-prefixed ALPN id
(1, vec![0x02, b'h', b'3'])
}
fn ipv4hint_single() -> (u16, Vec<u8>) {
(4, vec![93, 184, 216, 34])
}
fn ipv6hint_single() -> (u16, Vec<u8>) {
// 2606:4700::1
(
6,
vec![
0x26, 0x06, 0x47, 0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x01,
],
)
}
#[test]
fn strips_ipv6hint_and_keeps_other_params() {
let rdata = build_rdata(1, &[], &[alpn_h3(), ipv4hint_single(), ipv6hint_single()]);
let stripped = strip_ipv6hint(&rdata).expect("ipv6hint present → stripped");
let expected = build_rdata(1, &[], &[alpn_h3(), ipv4hint_single()]);
assert_eq!(stripped, expected);
}
#[test]
fn no_ipv6hint_returns_none() {
let rdata = build_rdata(1, &[], &[alpn_h3(), ipv4hint_single()]);
assert!(strip_ipv6hint(&rdata).is_none());
}
#[test]
fn alias_mode_empty_params_returns_none() {
let rdata = build_rdata(0, &["example", "com"], &[]);
assert!(strip_ipv6hint(&rdata).is_none());
}
#[test]
fn only_ipv6hint_yields_empty_param_section() {
let rdata = build_rdata(1, &[], &[ipv6hint_single()]);
let stripped = strip_ipv6hint(&rdata).expect("ipv6hint present → stripped");
let expected = build_rdata(1, &[], &[]);
assert_eq!(stripped, expected);
}
#[test]
fn preserves_target_name() {
let rdata = build_rdata(1, &["svc", "example", "net"], &[ipv6hint_single()]);
let stripped = strip_ipv6hint(&rdata).unwrap();
assert!(stripped.starts_with(&[0x00, 0x01])); // priority
assert_eq!(&stripped[2..6], b"\x03svc");
}
#[test]
fn truncated_rdata_returns_none() {
// Priority only, no target terminator.
assert!(strip_ipv6hint(&[0, 1, 3, b'c', b'o', b'm']).is_none());
}
#[test]
fn empty_input_returns_none() {
assert!(strip_ipv6hint(&[]).is_none());
}
#[test]
fn param_length_overflow_returns_none() {
// key=6, length=0xFFFF but value is short — malformed.
let rdata = vec![0, 1, 0, 0, 6, 0xFF, 0xFF, 0, 1, 2];
assert!(strip_ipv6hint(&rdata).is_none());
}
}

View File

@@ -2,6 +2,10 @@ use std::net::SocketAddr;
use log::info;
#[cfg(any(target_os = "macos", target_os = "linux"))]
use crate::forward::Upstream;
use crate::forward::UpstreamPool;
fn print_recursive_hint() {
let is_recursive = crate::config::load_config("numa.toml")
.map(|c| c.config.upstream.mode == crate::config::UpstreamMode::Recursive)
@@ -18,11 +22,22 @@ fn is_loopback_or_stub(addr: &str) -> bool {
}
/// A conditional forwarding rule: domains matching `suffix` are forwarded to `upstream`.
#[derive(Debug, Clone)]
#[derive(Clone)]
pub struct ForwardingRule {
pub suffix: String,
dot_suffix: String, // pre-computed ".suffix" for zero-alloc matching
pub upstream: SocketAddr,
pub upstream: UpstreamPool,
}
impl ForwardingRule {
pub fn new(suffix: String, upstream: UpstreamPool) -> Self {
let dot_suffix = format!(".{}", suffix);
Self {
suffix,
dot_suffix,
upstream,
}
}
}
/// Result of system DNS discovery — default upstream + conditional forwarding rules.
@@ -91,7 +106,7 @@ pub fn try_port53_advisory(bind_addr: &str, err: &std::io::Error) -> Option<Stri
sudo numa install (on Windows, run as Administrator)
2. Run on a non-privileged port for testing.
Create ~/.config/numa/numa.toml with:
Create {} with:
[server]
bind_addr = \"127.0.0.1:5354\"
@@ -100,7 +115,8 @@ pub fn try_port53_advisory(bind_addr: &str, err: &std::io::Error) -> Option<Stri
Then run: numa
Test with: dig @127.0.0.1 -p 5354 example.com
"
",
crate::suggested_config_path().display()
))
}
@@ -197,12 +213,13 @@ fn discover_macos() -> SystemDnsInfo {
}
// Sort longest suffix first for most-specific matching
rules.sort_by(|a, b| b.suffix.len().cmp(&a.suffix.len()));
rules.sort_by_key(|r| std::cmp::Reverse(r.suffix.len()));
for rule in &rules {
info!(
"auto-discovered forwarding: *.{} -> {}",
rule.suffix, rule.upstream
rule.suffix,
rule.upstream.label()
);
}
if rules.is_empty() {
@@ -220,12 +237,9 @@ fn discover_macos() -> SystemDnsInfo {
#[cfg(any(target_os = "macos", target_os = "linux"))]
fn make_rule(domain: &str, nameserver: &str) -> Option<ForwardingRule> {
let addr: SocketAddr = format!("{}:53", nameserver).parse().ok()?;
Some(ForwardingRule {
dot_suffix: format!(".{}", domain),
suffix: domain.to_string(),
upstream: addr,
})
let addr = crate::forward::parse_upstream_addr(nameserver, 53).ok()?;
let pool = UpstreamPool::new(vec![Upstream::Udp(addr)], vec![]);
Some(ForwardingRule::new(domain.to_string(), pool))
}
#[cfg(target_os = "linux")]
@@ -562,7 +576,7 @@ fn windows_backup_path() -> std::path::PathBuf {
#[cfg(windows)]
fn disable_dnscache() -> Result<bool, String> {
// Check if Dnscache is running (it holds port 53 at kernel level)
// Check if Dnscache is running (it can hold port 53)
let output = std::process::Command::new("sc")
.args(["query", "Dnscache"])
.output()
@@ -593,8 +607,16 @@ fn disable_dnscache() -> Result<bool, String> {
return Err("failed to disable Dnscache via registry (run as Administrator?)".into());
}
eprintln!(" Dnscache disabled. A reboot is required to free port 53.");
Ok(true)
// Dnscache is disabled for next boot. Check whether port 53 is
// actually blocked right now — on many Windows configurations
// Dnscache doesn't bind port 53 even while running.
let port_blocked = std::net::UdpSocket::bind("127.0.0.1:53").is_err();
if port_blocked {
eprintln!(" Dnscache disabled. A reboot is required to free port 53.");
} else {
eprintln!(" Dnscache disabled. Port 53 is free.");
}
Ok(port_blocked)
}
#[cfg(windows)]
@@ -661,6 +683,83 @@ fn install_windows() -> Result<(), String> {
std::fs::write(&path, json).map_err(|e| format!("failed to write backup: {}", e))?;
}
// On re-install, stop the running service first so the binary can be
// overwritten and port 53 is released for the Dnscache probe.
if is_service_registered() {
eprintln!(" Stopping existing service...");
stop_service_scm();
}
let needs_reboot = disable_dnscache()?;
// Copy the binary to a stable path under ProgramData and register it
// as a real Windows service (SCM-managed, boot-time, auto-restart).
let service_exe = install_service_binary()?;
register_service_scm(&service_exe)?;
if needs_reboot {
// Dnscache still holds port 53 until reboot. Do NOT redirect DNS
// yet — nothing is listening on 127.0.0.1:53, so redirecting now
// would kill DNS. The service will call redirect_dns_to_localhost()
// on its first startup after reboot.
} else {
redirect_dns_with_interfaces(&interfaces)?;
match start_service_scm() {
Ok(_) => eprintln!(" Service started."),
Err(e) => eprintln!(
" warning: service registered but could not start now: {}",
e
),
}
}
eprintln!();
if !has_useful_existing {
eprintln!(" Original DNS saved to {}", path.display());
}
eprintln!(" Run 'numa uninstall' to restore.\n");
if needs_reboot {
eprintln!(" *** Reboot required. Numa will start automatically. ***\n");
} else {
eprintln!(" Numa is running.\n");
}
print_recursive_hint();
Ok(())
}
/// Stable install location for the service binary. SCM keeps a handle to
/// this path; the user's Downloads folder (where `current_exe()` points at
/// install time) is not durable.
#[cfg(windows)]
fn windows_service_exe_path() -> std::path::PathBuf {
crate::data_dir().join("bin").join("numa.exe")
}
/// Run `sc.exe` with the given args and return its merged stdout/stderr on
/// failure. `sc` emits errors on stdout (not stderr) on Windows, so the
/// caller reads stdout to format a useful error.
#[cfg(windows)]
fn run_sc(args: &[&str]) -> Result<std::process::Output, String> {
let out = std::process::Command::new("sc")
.args(args)
.output()
.map_err(|e| format!("failed to run sc {}: {}", args.first().unwrap_or(&""), e))?;
Ok(out)
}
/// Point all active network interfaces at 127.0.0.1 so Numa handles DNS.
/// Called from the service on first boot after a reboot that freed Dnscache.
#[cfg(windows)]
pub fn redirect_dns_to_localhost() -> Result<(), String> {
let interfaces = get_windows_interfaces()?;
redirect_dns_with_interfaces(&interfaces)
}
#[cfg(windows)]
fn redirect_dns_with_interfaces(
interfaces: &std::collections::HashMap<String, WindowsInterfaceDns>,
) -> Result<(), String> {
for name in interfaces.keys() {
let status = std::process::Command::new("netsh")
.args([
@@ -685,63 +784,184 @@ fn install_windows() -> Result<(), String> {
);
}
}
let needs_reboot = disable_dnscache()?;
register_autostart();
eprintln!();
if !has_useful_existing {
eprintln!(" Original DNS saved to {}", path.display());
}
eprintln!(" Run 'numa uninstall' to restore.\n");
if needs_reboot {
eprintln!(" *** Reboot required. Numa will start automatically. ***\n");
} else {
eprintln!(" Numa will start automatically on next boot.\n");
}
print_recursive_hint();
Ok(())
}
/// Register numa to auto-start on boot via registry Run key.
/// Copy the currently-running binary to the service install location. SCM
/// keeps a handle to this path, so it must be stable across user sessions.
#[cfg(windows)]
fn register_autostart() {
let exe = std::env::current_exe()
.map(|p| p.to_string_lossy().to_string())
.unwrap_or_else(|_| "numa".into());
let _ = std::process::Command::new("reg")
.args([
"add",
"HKLM\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run",
"/v",
"Numa",
"/t",
"REG_SZ",
"/d",
&exe,
"/f",
])
.status();
eprintln!(" Registered auto-start on boot.");
fn install_service_binary() -> Result<std::path::PathBuf, String> {
let src = std::env::current_exe().map_err(|e| format!("current_exe(): {}", e))?;
let dst = windows_service_exe_path();
if let Some(parent) = dst.parent() {
std::fs::create_dir_all(parent)
.map_err(|e| format!("failed to create {}: {}", parent.display(), e))?;
}
// Copy only if source and destination differ; running the binary from
// its install location is a supported (re-install) case.
if src != dst {
std::fs::copy(&src, &dst).map_err(|e| {
format!(
"failed to copy {} -> {}: {}",
src.display(),
dst.display(),
e
)
})?;
}
Ok(dst)
}
/// Remove numa auto-start registry key.
/// Remove the service binary on uninstall. Ignore failures — the service
/// is already deleted; a leftover file in ProgramData is not a hard error.
#[cfg(windows)]
fn remove_autostart() {
let _ = std::process::Command::new("reg")
.args([
"delete",
"HKLM\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run",
"/v",
"Numa",
"/f",
])
.status();
fn remove_service_binary() {
let _ = std::fs::remove_file(windows_service_exe_path());
}
/// Register numa with the Service Control Manager, boot-time auto-start,
/// LocalSystem context, with a failure policy of restart-after-5s.
#[cfg(windows)]
fn register_service_scm(exe: &std::path::Path) -> Result<(), String> {
let bin_path = format!("\"{}\" --service", exe.display());
let name = crate::windows_service::SERVICE_NAME;
// sc.exe uses a leading space as its `name= value` delimiter; the space
// after `=` is mandatory.
let create = run_sc(&[
"create",
name,
"binPath=",
&bin_path,
"DisplayName=",
"Numa DNS",
"start=",
"auto",
"obj=",
"LocalSystem",
])?;
if !create.status.success() {
let out = String::from_utf8_lossy(&create.stdout);
// "service already exists" is 1073 — treat as idempotent success.
if !out.contains("1073") {
return Err(format!("sc create failed: {}", out.trim()));
}
}
let _ = run_sc(&[
"description",
name,
"Self-sovereign DNS resolver (ad blocking, DoH/DoT, local zones).",
]);
// Restart on crash: 5s, 5s, 10s; reset failure counter after 60s.
let _ = run_sc(&[
"failure",
name,
"reset=",
"60",
"actions=",
"restart/5000/restart/5000/restart/10000",
]);
eprintln!(" Registered service '{}' (boot-time).", name);
Ok(())
}
/// Start the service. Safe to call on a freshly-registered service — SCM
/// will fail with 1056 ("already running") or 1058 ("disabled") and we
/// return the underlying error string rather than masking it.
#[cfg(windows)]
fn start_service_scm() -> Result<(), String> {
let out = run_sc(&["start", crate::windows_service::SERVICE_NAME])?;
if !out.status.success() {
let text = String::from_utf8_lossy(&out.stdout);
if text.contains("1056") {
return Ok(()); // already running
}
return Err(format!("sc start failed: {}", text.trim()));
}
Ok(())
}
/// Stop the service and wait for it to fully exit. Idempotent —
/// already-stopped or missing service is not an error.
#[cfg(windows)]
fn stop_service_scm() {
let name = crate::windows_service::SERVICE_NAME;
let _ = run_sc(&["stop", name]);
// Wait up to 10s for the service to reach STOPPED state so the
// binary file handle is released before we try to overwrite it.
for _ in 0..20 {
if let Ok(out) = run_sc(&["query", name]) {
let text = String::from_utf8_lossy(&out.stdout);
if text.contains("STOPPED") || text.contains("1060") {
return;
}
}
std::thread::sleep(std::time::Duration::from_millis(500));
}
eprintln!(" warning: service did not stop within 10s");
}
/// Remove the service from SCM. Idempotent — see `stop_service_scm`.
#[cfg(windows)]
fn delete_service_scm() {
if let Err(e) = run_sc(&["delete", crate::windows_service::SERVICE_NAME]) {
log::warn!("sc delete failed: {}", e);
}
}
/// Check whether the service is registered with SCM (regardless of state).
#[cfg(windows)]
fn is_service_registered() -> bool {
run_sc(&["query", crate::windows_service::SERVICE_NAME])
.map(|o| parse_sc_registered(o.status.success(), &String::from_utf8_lossy(&o.stdout)))
.unwrap_or(false)
}
/// Parse `sc query` output to determine if a service is registered.
/// Extracted for testability — the actual `sc` call is in `is_service_registered`.
#[cfg(any(windows, test))]
fn parse_sc_registered(exit_success: bool, stdout: &str) -> bool {
if exit_success {
return true;
}
// Error 1060 = "The specified service does not exist as an installed service."
!stdout.contains("1060")
}
/// Print service state from SCM.
#[cfg(windows)]
fn service_status_windows() -> Result<(), String> {
let out = run_sc(&["query", crate::windows_service::SERVICE_NAME])?;
let text = String::from_utf8_lossy(&out.stdout);
let display = parse_sc_state(&text);
eprintln!(" {}\n", display);
Ok(())
}
/// Parse the STATE line from `sc query` output. Returns a human-readable
/// string like "STATE : 4 RUNNING" or "Service is not installed."
#[cfg(any(windows, test))]
fn parse_sc_state(sc_output: &str) -> String {
if sc_output.contains("1060") {
return "Service is not installed.".to_string();
}
sc_output
.lines()
.find(|l| l.contains("STATE"))
.map(|l| l.trim().to_string())
.unwrap_or_else(|| "unknown".to_string())
}
#[cfg(windows)]
fn uninstall_windows() -> Result<(), String> {
remove_autostart();
// Stop + remove the service before touching DNS, so port 53 is released
// cleanly and the failure-restart policy doesn't resurrect it.
stop_service_scm();
delete_service_scm();
remove_service_binary();
let path = windows_backup_path();
let json = std::fs::read_to_string(&path)
.map_err(|e| format!("no backup found at {}: {}", path.display(), e))?;
@@ -814,10 +1034,13 @@ fn uninstall_windows() -> Result<(), String> {
/// Find the upstream for a domain by checking forwarding rules.
/// Returns None if no rule matches (use default upstream).
/// Zero-allocation on the hot path — dot_suffix is pre-computed.
pub fn match_forwarding_rule(domain: &str, rules: &[ForwardingRule]) -> Option<SocketAddr> {
pub fn match_forwarding_rule<'a>(
domain: &str,
rules: &'a [ForwardingRule],
) -> Option<&'a UpstreamPool> {
for rule in rules {
if domain == rule.suffix || domain.ends_with(&rule.dot_suffix) {
return Some(rule.upstream);
return Some(&rule.upstream);
}
}
None
@@ -1035,6 +1258,62 @@ pub fn install_service() -> Result<(), String> {
result
}
/// Start the service. If already installed, just starts it via the platform
/// service manager. If not installed, falls through to a full install.
pub fn start_service() -> Result<(), String> {
#[cfg(target_os = "macos")]
{
install_service()
}
#[cfg(target_os = "linux")]
{
install_service()
}
#[cfg(windows)]
{
if is_service_registered() {
start_service_scm()?;
eprintln!(" Service started.\n");
Ok(())
} else {
install_service()
}
}
#[cfg(not(any(target_os = "macos", target_os = "linux", windows)))]
{
Err("service start not supported on this OS".to_string())
}
}
/// Stop the service without uninstalling it.
pub fn stop_service() -> Result<(), String> {
#[cfg(target_os = "macos")]
{
uninstall_service()
}
#[cfg(target_os = "linux")]
{
uninstall_service()
}
#[cfg(windows)]
{
let out = run_sc(&["stop", crate::windows_service::SERVICE_NAME])?;
if !out.status.success() {
let text = String::from_utf8_lossy(&out.stdout);
// 1062 = not started, 1060 = does not exist
if !text.contains("1062") && !text.contains("1060") {
return Err(format!("sc stop failed: {}", text.trim()));
}
}
eprintln!(" Service stopped.\n");
Ok(())
}
#[cfg(not(any(target_os = "macos", target_os = "linux", windows)))]
{
Err("service stop not supported on this OS".to_string())
}
}
/// Uninstall the Numa system service.
pub fn uninstall_service() -> Result<(), String> {
let _ = untrust_ca();
@@ -1104,7 +1383,14 @@ pub fn restart_service() -> Result<(), String> {
eprintln!(" Service restarted → {}\n", version);
Ok(())
}
#[cfg(not(any(target_os = "macos", target_os = "linux")))]
#[cfg(windows)]
{
stop_service_scm();
start_service_scm()?;
eprintln!(" Service restarted.\n");
Ok(())
}
#[cfg(not(any(target_os = "macos", target_os = "linux", windows)))]
{
Err("service restart not supported on this OS".to_string())
}
@@ -1120,13 +1406,17 @@ pub fn service_status() -> Result<(), String> {
{
service_status_linux()
}
#[cfg(not(any(target_os = "macos", target_os = "linux")))]
#[cfg(windows)]
{
service_status_windows()
}
#[cfg(not(any(target_os = "macos", target_os = "linux", windows)))]
{
Err("service status not supported on this OS".to_string())
}
}
#[cfg(any(target_os = "macos", target_os = "linux"))]
#[cfg(target_os = "macos")]
fn replace_exe_path(service: &str) -> Result<String, String> {
let exe_path =
std::env::current_exe().map_err(|e| format!("failed to get current exe: {}", e))?;
@@ -1374,10 +1664,78 @@ fn uninstall_linux() -> Result<(), String> {
Ok(())
}
/// Fallback install location when current_exe() sits on a path the
/// dynamic user cannot traverse (e.g. `/home/<user>/` mode 0700).
#[cfg(target_os = "linux")]
fn linux_service_exe_path() -> std::path::PathBuf {
std::path::PathBuf::from("/usr/local/bin/numa")
}
/// True iff every ancestor of `p` (excluding `/`) grants world-execute —
/// i.e. the `DynamicUser=yes` service account can traverse the path and
/// exec the binary without being in any group. Linuxbrew's
/// `/home/linuxbrew` is 0755 (traversable, keep brew's path, upgrades
/// via `brew` propagate). A build tree under `/home/<user>/` (0700) or
/// `~/.cargo/bin/` is not (copy to /usr/local/bin so systemd can reach it).
#[cfg(target_os = "linux")]
fn path_world_traversable_linux(p: &std::path::Path) -> bool {
use std::os::unix::fs::PermissionsExt;
let mut current = p;
while let Some(parent) = current.parent() {
if parent.as_os_str().is_empty() || parent == std::path::Path::new("/") {
break;
}
match std::fs::metadata(parent) {
Ok(m) if m.permissions().mode() & 0o001 != 0 => {}
_ => return false,
}
current = parent;
}
true
}
#[cfg(target_os = "linux")]
fn install_service_binary_linux() -> Result<std::path::PathBuf, String> {
let src = std::env::current_exe().map_err(|e| format!("current_exe(): {}", e))?;
if path_world_traversable_linux(&src) {
return Ok(src);
}
let dst = linux_service_exe_path();
if src == dst {
return Ok(dst);
}
if let Some(parent) = dst.parent() {
std::fs::create_dir_all(parent)
.map_err(|e| format!("failed to create {}: {}", parent.display(), e))?;
}
// Atomic replace via temp + rename. Plain copy fails with ETXTBSY when
// re-installing while the service is running the previous binary —
// rename swaps the path while the running process keeps the old inode.
let tmp = dst.with_extension("new");
std::fs::copy(&src, &tmp).map_err(|e| {
format!(
"failed to copy {} -> {}: {}",
src.display(),
tmp.display(),
e
)
})?;
std::fs::rename(&tmp, &dst).map_err(|e| {
let _ = std::fs::remove_file(&tmp);
format!(
"failed to rename {} -> {}: {}",
tmp.display(),
dst.display(),
e
)
})?;
Ok(dst)
}
#[cfg(target_os = "linux")]
fn install_service_linux() -> Result<(), String> {
let unit = include_str!("../numa.service");
let unit = replace_exe_path(unit)?;
let exe = install_service_binary_linux()?;
let unit = include_str!("../numa.service").replace("{{exe_path}}", &exe.to_string_lossy());
std::fs::write(SYSTEMD_UNIT, unit)
.map_err(|e| format!("failed to write {}: {}", SYSTEMD_UNIT, e))?;
@@ -1389,7 +1747,9 @@ fn install_service_linux() -> Result<(), String> {
eprintln!(" warning: failed to configure system DNS: {}", e);
}
run_systemctl(&["start", "numa"])?;
// restart, not start: on re-install the service is already running
// the previous binary; restart picks up the new one.
run_systemctl(&["restart", "numa"])?;
eprintln!(" Service installed and started.");
eprintln!(" Numa will auto-start on boot and restart if killed.");
@@ -1705,22 +2065,25 @@ Wireless LAN adapter Wi-Fi:
}
#[test]
#[cfg(any(target_os = "macos", target_os = "linux"))]
fn replace_exe_path_substitutes_template() {
fn install_templates_contain_exe_path_placeholder() {
// Both files are substituted at install time — plist via
// replace_exe_path on macOS, numa.service via inline .replace
// in install_service_linux. Catch placeholder removal early.
let plist = include_str!("../com.numa.dns.plist");
let unit = include_str!("../numa.service");
assert!(plist.contains("{{exe_path}}"), "plist missing placeholder");
assert!(
unit.contains("{{exe_path}}"),
"unit file missing placeholder"
);
}
#[test]
#[cfg(target_os = "macos")]
fn replace_exe_path_substitutes_template() {
let plist = include_str!("../com.numa.dns.plist");
let result = replace_exe_path(plist).expect("replace_exe_path failed for plist");
assert!(!result.contains("{{exe_path}}"));
let result = replace_exe_path(unit).expect("replace_exe_path failed for unit");
assert!(!result.contains("{{exe_path}}"));
}
#[test]
@@ -1854,4 +2217,57 @@ Wireless LAN adapter Wi-Fi:
let err = std::io::Error::from(std::io::ErrorKind::AddrInUse);
assert!(try_port53_advisory("not-an-address", &err).is_none());
}
#[test]
fn sc_query_running_service_is_registered() {
assert!(parse_sc_registered(true, ""));
}
#[test]
fn sc_query_stopped_service_is_registered() {
let output = "SERVICE_NAME: Numa\n TYPE: 10 WIN32_OWN\n STATE: 1 STOPPED\n";
assert!(parse_sc_registered(true, output));
}
#[test]
fn sc_query_missing_service_not_registered() {
let output = "[SC] EnumQueryServicesStatus:OpenService FAILED 1060:\n\nThe specified service does not exist as an installed service.\n";
assert!(!parse_sc_registered(false, output));
}
#[test]
fn sc_query_other_error_assumes_registered() {
// Permission denied or other errors — don't assume unregistered.
let output = "[SC] OpenService FAILED 5:\n\nAccess is denied.\n";
assert!(parse_sc_registered(false, output));
}
#[test]
fn parse_sc_state_running() {
let output = "SERVICE_NAME: Numa\n TYPE : 10 WIN32_OWN_PROCESS\n STATE : 4 RUNNING\n WIN32_EXIT_CODE : 0\n";
assert!(parse_sc_state(output).contains("RUNNING"));
}
#[test]
fn parse_sc_state_stopped() {
let output = "SERVICE_NAME: Numa\n TYPE : 10 WIN32_OWN_PROCESS\n STATE : 1 STOPPED\n";
assert!(parse_sc_state(output).contains("STOPPED"));
}
#[test]
fn parse_sc_state_not_installed() {
let output = "[SC] EnumQueryServicesStatus:OpenService FAILED 1060:\n\n";
assert_eq!(parse_sc_state(output), "Service is not installed.");
}
#[test]
fn parse_sc_state_empty_output() {
assert_eq!(parse_sc_state(""), "unknown");
}
#[cfg(windows)]
#[test]
fn windows_config_dir_equals_data_dir() {
assert_eq!(crate::config_dir(), crate::data_dir());
}
}

96
src/testutil.rs Normal file
View File

@@ -0,0 +1,96 @@
use std::collections::{HashMap, HashSet};
use std::net::{Ipv4Addr, SocketAddr};
use std::path::PathBuf;
use std::sync::{Mutex, RwLock};
use std::time::Duration;
use tokio::net::UdpSocket;
use crate::blocklist::BlocklistStore;
use crate::buffer::BytePacketBuffer;
use crate::cache::DnsCache;
use crate::config::UpstreamMode;
use crate::ctx::ServerCtx;
use crate::forward::{Upstream, UpstreamPool};
use crate::health::HealthMeta;
use crate::lan::PeerStore;
use crate::override_store::OverrideStore;
use crate::packet::DnsPacket;
use crate::query_log::QueryLog;
use crate::service_store::ServiceStore;
use crate::srtt::SrttCache;
use crate::stats::ServerStats;
/// Minimal `ServerCtx` for tests. Override fields after construction
/// (all fields are `pub`), then wrap in `Arc`.
pub async fn test_ctx() -> ServerCtx {
let socket = UdpSocket::bind("127.0.0.1:0").await.unwrap();
ServerCtx {
socket,
zone_map: HashMap::new(),
cache: RwLock::new(DnsCache::new(100, 60, 86400)),
refreshing: Mutex::new(HashSet::new()),
stats: Mutex::new(ServerStats::new()),
overrides: RwLock::new(OverrideStore::new()),
blocklist: RwLock::new(BlocklistStore::new()),
query_log: Mutex::new(QueryLog::new(100)),
services: Mutex::new(ServiceStore::new()),
lan_peers: Mutex::new(PeerStore::new(90)),
forwarding_rules: Vec::new(),
upstream_pool: Mutex::new(UpstreamPool::new(
vec![Upstream::Udp("127.0.0.1:53".parse().unwrap())],
vec![],
)),
upstream_auto: false,
upstream_port: 53,
lan_ip: Mutex::new(Ipv4Addr::LOCALHOST),
timeout: Duration::from_millis(200),
hedge_delay: Duration::ZERO,
proxy_tld: "numa".to_string(),
proxy_tld_suffix: ".numa".to_string(),
lan_enabled: false,
config_path: "/tmp/test-numa.toml".to_string(),
config_found: false,
config_dir: PathBuf::from("/tmp"),
data_dir: PathBuf::from("/tmp"),
tls_config: None,
upstream_mode: UpstreamMode::Forward,
root_hints: Vec::new(),
srtt: RwLock::new(SrttCache::new(true)),
inflight: Mutex::new(HashMap::new()),
dnssec_enabled: false,
dnssec_strict: false,
health_meta: HealthMeta::test_fixture(),
ca_pem: None,
mobile_enabled: false,
mobile_port: 8765,
filter_aaaa: false,
}
}
/// Spawn a UDP socket that replies to the first DNS query with the given
/// response packet (patching the query ID to match). Returns the socket address.
pub async fn mock_upstream(response: DnsPacket) -> SocketAddr {
let sock = UdpSocket::bind("127.0.0.1:0").await.unwrap();
let addr = sock.local_addr().unwrap();
tokio::spawn(async move {
let mut buf = [0u8; 512];
let (_, src) = sock.recv_from(&mut buf).await.unwrap();
let query_id = u16::from_be_bytes([buf[0], buf[1]]);
let mut resp = response;
resp.header.id = query_id;
let mut out = BytePacketBuffer::new();
resp.write(&mut out).unwrap();
sock.send_to(out.filled(), src).await.unwrap();
});
addr
}
/// UDP socket that accepts connections but never replies.
/// Useful as an upstream that triggers timeouts.
pub fn blackhole_upstream() -> SocketAddr {
let sock = std::net::UdpSocket::bind("127.0.0.1:0").unwrap();
let addr = sock.local_addr().unwrap();
// Leak so it stays bound for the duration of the test process.
Box::leak(Box::new(sock));
addr
}

View File

@@ -66,13 +66,14 @@ pub fn try_data_dir_advisory(err: &crate::Error, data_dir: &Path) -> Option<Stri
sudo numa install (on Windows, run as Administrator)
2. Point data_dir at a path you can write.
Create ~/.config/numa/numa.toml with:
Create {} with:
[server]
data_dir = \"/path/you/can/write\"
",
data_dir.display()
data_dir.display(),
crate::suggested_config_path().display()
))
}
@@ -185,8 +186,19 @@ fn generate_service_cert(
}
}
if sans.is_empty() {
return Err("no valid service names for TLS cert".into());
// Loopback IP SANs so browsers can reach DoH at https://127.0.0.1/dns-query
sans.push(SanType::IpAddress(std::net::IpAddr::V4(
std::net::Ipv4Addr::LOCALHOST,
)));
sans.push(SanType::IpAddress(std::net::IpAddr::V6(
std::net::Ipv6Addr::LOCALHOST,
)));
for name in ["localhost", tld] {
match name.to_string().try_into() {
Ok(ia5) => sans.push(SanType::DnsName(ia5)),
Err(e) => warn!("invalid SAN {}: {}", name, e),
}
}
params.subject_alt_names = sans;
@@ -239,4 +251,72 @@ mod tests {
let err: crate::Error = "rcgen failure".into();
assert!(try_data_dir_advisory(&err, &PathBuf::from("/x")).is_none());
}
#[test]
fn service_cert_contains_expected_sans() {
use x509_parser::prelude::GeneralName;
let dir = std::env::temp_dir().join(format!("numa-test-san-{}", std::process::id()));
let _ = std::fs::remove_dir_all(&dir);
let (ca_der, issuer) = ensure_ca(&dir).unwrap();
let names = vec!["grafana".into(), "router".into()];
let (chain, _) = generate_service_cert(&ca_der, &issuer, "numa", &names).unwrap();
assert_eq!(chain.len(), 2, "chain should be [leaf, CA]");
let (_, cert) = x509_parser::parse_x509_certificate(chain[0].as_ref()).unwrap();
let san = cert
.tbs_certificate
.subject_alternative_name()
.unwrap()
.unwrap();
let dns: Vec<&str> = san
.value
.general_names
.iter()
.filter_map(|gn| match gn {
GeneralName::DNSName(s) => Some(*s),
_ => None,
})
.collect();
let ips: Vec<std::net::IpAddr> = san
.value
.general_names
.iter()
.filter_map(|gn| match gn {
GeneralName::IPAddress(b) => match b.len() {
4 => Some(std::net::IpAddr::V4(std::net::Ipv4Addr::new(
b[0], b[1], b[2], b[3],
))),
16 => {
let a: [u8; 16] = (*b).try_into().unwrap();
Some(std::net::IpAddr::V6(std::net::Ipv6Addr::from(a)))
}
_ => None,
},
_ => None,
})
.collect();
// DNS SANs
assert!(dns.contains(&"*.numa"), "missing wildcard SAN");
assert!(dns.contains(&"grafana.numa"), "missing service SAN");
assert!(dns.contains(&"router.numa"), "missing service SAN");
assert!(dns.contains(&"localhost"), "missing localhost SAN");
assert!(dns.contains(&"numa"), "missing bare TLD SAN");
// IP SANs
assert!(
ips.contains(&std::net::IpAddr::V4(std::net::Ipv4Addr::LOCALHOST)),
"missing 127.0.0.1 SAN"
);
assert!(
ips.contains(&std::net::IpAddr::V6(std::net::Ipv6Addr::LOCALHOST)),
"missing ::1 SAN"
);
let _ = std::fs::remove_dir_all(&dir);
}
}

147
src/windows_service.rs Normal file
View File

@@ -0,0 +1,147 @@
//! Windows service wrapper.
//!
//! Lets the `numa.exe` binary act as a real Windows service registered with
//! the Service Control Manager (SCM). Invoked via `numa.exe --service` (the
//! form that `sc create … binPath=` uses).
//!
//! Interactive runs (`numa.exe`, `numa.exe run`, `numa.exe install`) do not
//! go through this module — they keep their existing console-attached
//! behaviour.
use std::ffi::OsString;
use std::sync::mpsc;
use std::time::Duration;
use windows_service::service::{
ServiceControl, ServiceControlAccept, ServiceExitCode, ServiceState, ServiceStatus, ServiceType,
};
use windows_service::service_control_handler::{self, ServiceControlHandlerResult};
use windows_service::{define_windows_service, service_dispatcher};
pub const SERVICE_NAME: &str = "Numa";
define_windows_service!(ffi_service_main, service_main);
/// Entry point the SCM hands control to after `StartServiceCtrlDispatcherW`.
/// Any panic here vanishes silently into the service host — log instead of
/// unwrapping.
fn service_main(_arguments: Vec<OsString>) {
if let Err(e) = run_service() {
log::error!("numa service exited with error: {:?}", e);
}
}
fn run_service() -> windows_service::Result<()> {
let (shutdown_tx, shutdown_rx) = mpsc::channel::<()>();
let event_handler = move |control_event| -> ServiceControlHandlerResult {
match control_event {
ServiceControl::Stop | ServiceControl::Shutdown => {
let _ = shutdown_tx.send(());
ServiceControlHandlerResult::NoError
}
ServiceControl::Interrogate => ServiceControlHandlerResult::NoError,
_ => ServiceControlHandlerResult::NotImplemented,
}
};
let status_handle = service_control_handler::register(SERVICE_NAME, event_handler)?;
status_handle.set_service_status(ServiceStatus {
service_type: ServiceType::OWN_PROCESS,
current_state: ServiceState::Running,
controls_accepted: ServiceControlAccept::STOP | ServiceControlAccept::SHUTDOWN,
exit_code: ServiceExitCode::Win32(0),
checkpoint: 0,
wait_hint: Duration::default(),
process_id: None,
})?;
// Spin up a multi-threaded tokio runtime and run the server on it. A
// dedicated thread runs the runtime so this function can return cleanly
// once the SCM tells us to stop — we can't block the dispatcher thread
// forever without preventing graceful shutdown.
let config_path = service_config_path();
let (server_done_tx, server_done_rx) = mpsc::channel::<()>();
let server_thread = std::thread::spawn(move || {
let runtime = match tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
{
Ok(rt) => rt,
Err(e) => {
log::error!("failed to build tokio runtime: {}", e);
let _ = server_done_tx.send(());
return;
}
};
if let Err(e) = runtime.block_on(crate::serve::run(config_path)) {
log::error!("numa serve exited with error: {}", e);
}
let _ = server_done_tx.send(());
});
// Wait for the API to be ready, then ensure DNS points at localhost.
// On first boot after install (Dnscache was disabled, reboot freed
// port 53), the installer deferred the DNS redirect — do it now.
let api_up = (0..20).any(|i| {
if i > 0 {
std::thread::sleep(Duration::from_millis(500));
}
std::net::TcpStream::connect(("127.0.0.1", crate::config::DEFAULT_API_PORT)).is_ok()
});
if api_up {
if let Err(e) = crate::system_dns::redirect_dns_to_localhost() {
log::warn!("could not redirect DNS to localhost: {}", e);
}
} else {
log::error!("numa API did not start within 10s — DNS not redirected");
}
// Wait for either SCM stop or server termination.
loop {
if shutdown_rx.recv_timeout(Duration::from_millis(500)).is_ok() {
break;
}
if server_done_rx.try_recv().is_ok() {
break;
}
}
// The server's tokio runtime runs detached inside server_thread. Abandon
// it — the process is about to report Stopped and the SCM will terminate
// us if we linger. Future work: plumb a cancellation signal into
// serve::run() for a clean teardown of listeners and in-flight queries.
drop(server_thread);
status_handle.set_service_status(ServiceStatus {
service_type: ServiceType::OWN_PROCESS,
current_state: ServiceState::Stopped,
controls_accepted: ServiceControlAccept::empty(),
exit_code: ServiceExitCode::Win32(0),
checkpoint: 0,
wait_hint: Duration::default(),
process_id: None,
})?;
Ok(())
}
/// Hand control to the SCM dispatcher. Blocks until the service stops.
/// Call only from the `--service` command path — interactive invocations
/// will hang here waiting for an SCM that isn't talking to them.
pub fn run_as_service() -> windows_service::Result<()> {
service_dispatcher::start(SERVICE_NAME, ffi_service_main)
}
/// Path to the config file used when running under SCM. SCM launches the
/// service with SYSTEM's working directory (usually `C:\Windows\System32`),
/// so a relative `numa.toml` lookup won't find anything meaningful.
fn service_config_path() -> String {
crate::data_dir()
.join("numa.toml")
.to_string_lossy()
.into_owned()
}

1416
src/wire.rs Normal file

File diff suppressed because it is too large Load Diff

5
tests/docker/hold53.py Normal file
View File

@@ -0,0 +1,5 @@
import socket, signal
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 0)
s.bind(("", 53))
signal.pause()

288
tests/docker/install-systemd.sh Executable file
View File

@@ -0,0 +1,288 @@
#!/usr/bin/env bash
#
# Systemd service install verification for the DynamicUser-based Linux
# service unit. Stands up a privileged ubuntu:24.04 container with systemd
# as PID 1, builds numa inside, runs three scenarios that CI does not:
#
# A. Fresh install — every advertised port is not just bound but
# functional (DNS resolves on :53, TLS handshake validates against
# numa's CA on :853/:443, HTTP responds on :80, API on :5380).
# B. Upgrade from pre-drop layout (root-owned /var/lib/numa) preserves
# the CA fingerprint — users' browser-installed CA trust survives.
# C. Install from a 0700 source directory stages the binary under
# /usr/local/bin/numa and the service starts from there.
#
# First run is slow (~5-10 min): image pull + apt + cold cargo build.
# Subsequent runs reuse cached docker volumes for cargo + target (~30s).
#
# Requirements: docker
# Usage: ./tests/docker/install-systemd.sh
set -u
set -o pipefail
GREEN="\033[32m"; RED="\033[31m"; RESET="\033[0m"
pass() { printf " ${GREEN}PASS${RESET}: %s\n" "$*"; }
fail() { printf " ${RED}FAIL${RESET}: %s\n" "$*"; FAIL=1; }
# ============================================================
# Mode B: running inside the systemd container — run scenarios
# ============================================================
if [ "${NUMA_INSIDE:-}" = "1" ]; then
set +e # assertions report pass/fail, don't abort
FAIL=0
NUMA=/work/target/release/numa
reset_state() {
"$NUMA" uninstall >/dev/null 2>&1 || true
systemctl reset-failed numa 2>/dev/null || true
rm -rf /var/lib/numa /var/lib/private/numa /etc/numa /home/builder /usr/local/bin/numa
systemctl daemon-reload 2>/dev/null || true
}
main_pid_user() {
local pid
pid=$(systemctl show -p MainPID --value numa)
[ "$pid" != "0" ] || { echo ""; return; }
ps -o user= -p "$pid" 2>/dev/null | tr -d ' '
}
# MainPID + user briefly stabilize after a fresh restart. Retry so we
# don't race the moment systemd flips the service to "active" vs when
# the forked numa process actually owns MainPID.
assert_nonroot() {
local pid user comm n=0
while [ $n -lt 20 ]; do
pid=$(systemctl show -p MainPID --value numa)
if [ "$pid" != "0" ]; then
comm=$(ps -o comm= -p "$pid" 2>/dev/null | tr -d ' ')
user=$(ps -o user= -p "$pid" 2>/dev/null | tr -d ' ')
if [ "$comm" = "numa" ]; then
if [ "$user" = "root" ]; then
fail "daemon runs as root (expected transient UID)"
else
pass "daemon runs as $user (non-root)"
fi
return
fi
fi
sleep 0.2
n=$((n + 1))
done
fail "numa MainPID did not settle (last: pid=${pid:-?} comm=${comm:-?} user=${user:-?})"
}
# Functional DNS check: just "port 53 bound" isn't enough — systemd-resolved
# listens on 127.0.0.53 and would satisfy a bind test. Retries for ~15s
# to tolerate cold-start upstream / blocklist warmup.
assert_dns_works() {
local n=0
while [ $n -lt 15 ]; do
if dig @127.0.0.1 -p 53 example.com +short +timeout=2 +tries=1 2>/dev/null \
| grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
pass "DNS resolves on :53 (A record returned)"
return
fi
sleep 1
n=$((n + 1))
done
fail "DNS did not return an A record on :53 within 15s"
}
# TLS handshake: cert must validate against numa's CA when connecting
# to a .numa SNI. Catches port-not-bound, wrong cert, missing CA file.
assert_tls_handshake() {
local port=$1 sni=${2:-numa.numa} out
if out=$(openssl s_client -connect "127.0.0.1:${port}" \
-servername "$sni" \
-CAfile /var/lib/numa/ca.pem \
-verify_return_error </dev/null 2>&1); then
if echo "$out" | grep -q 'Verify return code: 0 (ok)'; then
pass "TLS handshake + cert chain verified on :${port}"
else
fail "TLS handshake on :${port} did not report 'Verify return code: 0'"
fi
else
fail "openssl s_client failed connecting to :${port}"
fi
}
assert_http_responds() {
local code
code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 http://127.0.0.1/ || echo 000)
if [ "$code" != "000" ]; then
pass "HTTP responds on :80 (status $code)"
else
fail "HTTP :80 connection failed"
fi
}
assert_api_healthy() {
if curl -sf --max-time 3 http://127.0.0.1:5380/health >/dev/null; then
pass "API /health OK on :5380"
else
fail "API /health failed on :5380"
fi
}
ca_fingerprint() {
openssl x509 -in /var/lib/numa/ca.pem -noout -fingerprint -sha256 2>/dev/null \
| sed 's/.*=//'
}
wait_active() {
local n=0
while [ $n -lt 20 ]; do
systemctl is-active --quiet numa && return 0
sleep 0.5
n=$((n + 1))
done
fail "service did not become active within 10s"
systemctl status numa --no-pager -l 2>&1 | head -20 || true
return 1
}
# ---- Scenario A ----
printf "\n=== Scenario A: fresh install — every advertised port is functional ===\n"
reset_state
"$NUMA" install >/tmp/installA.log 2>&1 || { fail "install failed"; tail -20 /tmp/installA.log; }
wait_active || true
assert_nonroot
assert_dns_works
assert_tls_handshake 853
assert_tls_handshake 443
assert_http_responds
assert_api_healthy
# ---- Scenario B ----
# Pre-drop installs left /var/lib/numa as a plain root-owned tree.
# Flattening the current DynamicUser layout back into that shape
# simulates the upgrade path without needing an actual old binary.
printf "\n=== Scenario B: CA fingerprint survives upgrade from pre-drop layout ===\n"
fp_before=$(ca_fingerprint)
if [ -z "$fp_before" ]; then
fail "could not read initial CA fingerprint (skipping scenario B)"
else
echo " CA fingerprint before: $fp_before"
"$NUMA" uninstall >/dev/null 2>&1 || true
tmp=$(mktemp -d)
cp -a /var/lib/private/numa/. "$tmp"/ 2>/dev/null || true
rm -rf /var/lib/numa /var/lib/private/numa
mv "$tmp" /var/lib/numa
chown -R root:root /var/lib/numa
chmod 755 /var/lib/numa
[ -f /var/lib/numa/ca.pem ] || fail "ca.pem missing from seeded legacy tree"
"$NUMA" install >/tmp/installB.log 2>&1 || { fail "upgrade install failed"; tail -20 /tmp/installB.log; }
wait_active || true
assert_nonroot
fp_after=$(ca_fingerprint)
if [ -z "$fp_after" ]; then
fail "could not read CA fingerprint after upgrade"
elif [ "$fp_before" = "$fp_after" ]; then
pass "CA fingerprint preserved across upgrade"
else
fail "CA fingerprint changed: before=$fp_before after=$fp_after"
fi
assert_dns_works
fi
# ---- Scenario C ----
printf "\n=== Scenario C: install from unreachable source stages binary to /usr/local/bin ===\n"
reset_state
mkdir -p /home/builder
chmod 700 /home/builder
cp "$NUMA" /home/builder/numa
chmod 755 /home/builder/numa
/home/builder/numa install >/tmp/installC.log 2>&1 || { fail "install failed"; tail -20 /tmp/installC.log; }
wait_active || true
if [ -x /usr/local/bin/numa ]; then
pass "binary staged to /usr/local/bin/numa"
else
fail "/usr/local/bin/numa missing after install from 0700 source"
fi
exec_line=$(grep '^ExecStart=' /etc/systemd/system/numa.service 2>/dev/null || echo "ExecStart=<unit missing>")
if echo "$exec_line" | grep -q '/usr/local/bin/numa'; then
pass "unit ExecStart points to staged path"
else
fail "unit ExecStart wrong: $exec_line"
fi
assert_nonroot
assert_dns_works
reset_state
rm -rf /home/builder
echo
if [ "$FAIL" -eq 0 ]; then
printf "${GREEN}── all scenarios passed ──${RESET}\n"
exit 0
else
printf "${RED}── some scenarios failed ──${RESET}\n"
exit 1
fi
fi
# ============================================================
# Mode A: host-side bootstrap
# ============================================================
set -e
cd "$(dirname "$0")/../.."
IMAGE=numa-install-systemd:local
CONTAINER="numa-install-systemd-$$"
trap 'docker rm -f "$CONTAINER" >/dev/null 2>&1 || true' EXIT
echo "── building systemd-in-container image (cached after first run) ──"
docker build --quiet -t "$IMAGE" -f - . <<'DOCKERFILE' >/dev/null
FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -qq && apt-get install -y -qq \
systemd systemd-sysv systemd-resolved \
ca-certificates curl build-essential \
pkg-config libssl-dev cmake make perl \
dnsutils iproute2 openssl \
&& rm -rf /var/lib/apt/lists/* \
&& for u in dev-hugepages.mount sys-fs-fuse-connections.mount \
systemd-logind.service getty.target console-getty.service; do \
systemctl mask $u; \
done
STOPSIGNAL SIGRTMIN+3
CMD ["/lib/systemd/systemd"]
DOCKERFILE
echo "── starting systemd container ──"
docker run -d --name "$CONTAINER" \
--privileged --cgroupns=host \
--tmpfs /run --tmpfs /run/lock --tmpfs /tmp:exec \
-v "$PWD:/src:ro" \
-v numa-install-systemd-cargo:/root/.cargo \
-v numa-install-systemd-work:/work \
"$IMAGE" >/dev/null
# Wait for systemd to be up
for _ in $(seq 1 30); do
state=$(docker exec "$CONTAINER" systemctl is-system-running 2>&1 || true)
case "$state" in running|degraded) break ;; esac
sleep 0.5
done
echo "── copying source into /work (writable) ──"
docker exec "$CONTAINER" bash -c '
mkdir -p /work
tar -C /src --exclude=./target --exclude=./.git --exclude=./.claude -cf - . | tar -C /work -xf -
'
echo "── rustup + cargo build --release --locked ──"
docker exec "$CONTAINER" bash -c '
set -e
if ! command -v cargo &>/dev/null; then
curl -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal --quiet
fi
. "$HOME/.cargo/env"
cd /work
cargo build --release --locked 2>&1 | tail -5
'
echo "── running scenarios ──"
docker exec -e NUMA_INSIDE=1 "$CONTAINER" bash /src/tests/docker/install-systemd.sh

164
tests/docker/issue-81.sh Executable file
View File

@@ -0,0 +1,164 @@
#!/usr/bin/env bash
#
# End-to-end validation of the issue #81 fix (config path advisory).
#
# Builds numa from two source trees — the buggy baseline and the fix
# candidate — inside one debian:bookworm container, then runs four
# scenarios to prove:
#
# 1. replication/main — reporter's sequence, bug confirmed
# 2. replication/fix — reporter's sequence, bug is gone
# 3. existing/main — pre-installed config at FHS data dir still loads
# 4. existing/fix — same, unchanged by the fix (no regression)
#
# Scenarios 3 and 4 guard against the fear that the fix might change
# candidate order and break existing daemon installs (including the
# macOS Homebrew-prefix layout at /usr/local/var/numa/).
#
# Usage:
# MAIN_SRC=/path/to/main-checkout FIX_SRC=/path/to/fix-worktree \
# ./tests/docker/issue-81.sh
#
# Defaults: MAIN_SRC = $(git rev-parse --show-toplevel), FIX_SRC = same.
set -euo pipefail
MAIN_SRC="${MAIN_SRC:-$(git rev-parse --show-toplevel)}"
FIX_SRC="${FIX_SRC:-$MAIN_SRC}"
GREEN="\033[32m"; RED="\033[31m"; RESET="\033[0m"
echo "── issue #81 validation ──"
echo " main: $MAIN_SRC"
echo " fix: $FIX_SRC"
echo
docker run --rm \
--platform linux/amd64 \
-v "$MAIN_SRC:/main:ro" \
-v "$FIX_SRC:/fix:ro" \
-v "$(dirname "$0")/hold53.py:/tmp/hold53.py:ro" \
-v numa-port53-cargo:/root/.cargo \
-v numa-port53-target:/work/target \
debian:bookworm bash -c '
set -euo pipefail
# Paths and ports used by all scenarios — keep in one place so the
# heredocs and the verdict greps cannot drift.
XDG_CONFIG="/root/.config/numa/numa.toml"
FHS_CONFIG="/var/lib/numa/numa.toml"
TEST_PORT="5354"
TEST_API_PORT="5380"
apt-get update -qq && apt-get install -y -qq curl build-essential python3 2>&1 | tail -1
if ! command -v cargo &>/dev/null; then
curl -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal --quiet
fi
. "$HOME/.cargo/env"
build_from() {
local label="$1"; local src="$2"
mkdir -p "/work/$label"
tar -C "$src" --exclude=./target --exclude=./.git -cf - . | tar -C "/work/$label" -xf -
(cd "/work/$label" && cargo build --release --locked 2>&1 | tail -1)
cp "/work/$label/target/release/numa" "/work/numa-$label"
}
build_from main /main
build_from fix /fix
holder=0
stop_holder() {
if [ "$holder" -ne 0 ]; then
kill "$holder" 2>/dev/null || true
wait "$holder" 2>/dev/null || true
holder=0
fi
}
trap stop_holder EXIT
start_holder() {
python3 /tmp/hold53.py &
holder=$!
sleep 0.3
}
write_test_config() {
local path="$1"
mkdir -p "$(dirname "$path")"
cat > "$path" <<EOF
[server]
bind_addr = "127.0.0.1:$TEST_PORT"
api_port = $TEST_API_PORT
EOF
}
verdict() {
local label="$1"; local expected="$2"; local file="$3"
# "cannot bind to" is printed by the advisory when numa fails to start.
# Its absence is a reliable proxy for "numa bound successfully" because
# the banner-only log we capture contains no other failure surface.
if grep -q "cannot bind to" "$file"; then
echo " [$label] did not bind $TEST_PORT — numa ignored the XDG config"
[ "$expected" = "ignored" ] && return 0 || return 1
else
echo " [$label] bound $TEST_PORT — config loaded"
[ "$expected" = "bound" ] && return 0 || return 1
fi
}
scenario_replication() {
local label="$1"; local bin="/work/numa-$label"; local expected="$2"
echo
echo "════════ REPLICATION / $label ════════"
rm -rf /root/.config/numa /var/lib/numa
mkdir -p "$(dirname "$XDG_CONFIG")"
start_holder
set +e
timeout 5 "$bin" > /tmp/run1.txt 2>&1
set -e
echo "── step 1: advisory printed by $label ──"
grep -E "Create .* with:" /tmp/run1.txt | sed "s/^/ /" || echo " <no advisory line>"
write_test_config "$XDG_CONFIG"
echo "── step 2: wrote config at $XDG_CONFIG ──"
set +e
timeout 3 "$bin" > /tmp/run2.txt 2>&1
set -e
stop_holder
verdict "$label" "$expected" /tmp/run2.txt
}
scenario_existing_install() {
local label="$1"; local bin="/work/numa-$label"
echo
echo "════════ EXISTING INSTALL / $label ════════"
rm -rf /root/.config/numa /var/lib/numa
write_test_config "$FHS_CONFIG"
start_holder
set +e
timeout 3 "$bin" > /tmp/run.txt 2>&1
set -e
stop_holder
verdict "$label" "bound" /tmp/run.txt
}
RC=0
scenario_replication main ignored || RC=1
scenario_replication fix bound || RC=1
scenario_existing_install main || RC=1
scenario_existing_install fix || RC=1
echo
if [ "$RC" -eq 0 ]; then
echo "── all scenarios matched expectations ──"
else
echo "── FAILURE: one or more scenarios diverged ──"
fi
exit $RC
'

View File

@@ -1,7 +1,10 @@
#!/usr/bin/env bash
# Integration test suite for Numa
# Runs a test instance on port 5354, validates all features, exits with status.
# Usage: ./tests/integration.sh [release|debug]
# Usage:
# ./tests/integration.sh [release|debug] # all suites
# SUITES=7 ./tests/integration.sh # only Suite 7
# SUITES=1,3,7 ./tests/integration.sh # Suites 1, 3, and 7
set -euo pipefail
@@ -14,6 +17,14 @@ LOG="/tmp/numa-integration-test.log"
PASSED=0
FAILED=0
# Suite filter: empty runs all; comma list runs a subset.
SUITES="${SUITES:-}"
should_run_suite() {
[ -z "$SUITES" ] && return 0
case ",$SUITES," in *",$1,"*) return 0;; esac
return 1
}
# Colors
GREEN="\033[32m"
RED="\033[31m"
@@ -53,7 +64,17 @@ CONF
echo "Starting Numa on :$PORT ($SUITE_NAME)..."
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
NUMA_PID=$!
sleep 4
sleep 2
# Wait for blocklist to load (if blocking is enabled in this suite)
if echo "$SUITE_CONFIG" | grep -q 'enabled = true'; then
for i in $(seq 1 20); do
LOADED=$(curl -sf http://127.0.0.1:$API_PORT/blocking/stats 2>/dev/null \
| grep -o '"domains_loaded":[0-9]*' | cut -d: -f2)
if [ "${LOADED:-0}" -gt 0 ]; then break; fi
sleep 1
done
fi
if ! kill -0 "$NUMA_PID" 2>/dev/null; then
echo "Failed to start Numa:"
@@ -156,6 +177,7 @@ CONF
}
# ---- Suite 1: Recursive mode + DNSSEC ----
if should_run_suite 1; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 1: Recursive + DNSSEC + Blocking ║"
@@ -224,7 +246,10 @@ kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
fi # end Suite 1
# ---- Suite 2: Forward mode (backward compat) ----
if should_run_suite 2; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 2: Forward (DoH) + Blocking ║"
@@ -251,7 +276,10 @@ enabled = true
enabled = false
"
fi # end Suite 2
# ---- Suite 3: Forward UDP (plain, no DoH) ----
if should_run_suite 3; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 3: Forward (UDP) + No Blocking ║"
@@ -297,7 +325,10 @@ kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
fi # end Suite 3
# ---- Suite 4: Local zones + Overrides API ----
if should_run_suite 4; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 4: Local Zones + Overrides API ║"
@@ -406,7 +437,10 @@ kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
fi # end Suite 4
# ---- Suite 5: DNS-over-TLS (RFC 7858) ----
if should_run_suite 5; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 5: DNS-over-TLS (RFC 7858) ║"
@@ -528,7 +562,10 @@ CONF
fi
sleep 1
fi # end Suite 5
# ---- Suite 6: Proxy + DoT coexistence ----
if should_run_suite 6; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 6: Proxy + DoT Coexistence ║"
@@ -622,6 +659,54 @@ CONF
"10.0.0.1" \
"$($KDIG +short dot-test.example A 2>/dev/null)"
echo ""
echo "=== DNS-over-HTTPS (RFC 8484) ==="
DOH_QUERY_FILE=/tmp/numa-doh-query.bin
DOH_RESP_FILE=/tmp/numa-doh-resp.bin
# Build DNS wire-format query for dot-test.example A
printf '\x00\x01\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x08dot-test\x07example\x00\x00\x01\x00\x01' > "$DOH_QUERY_FILE"
# POST valid DoH query
DOH_CODE=$(curl -sk -X POST \
--resolve "numa.numa:$PROXY_HTTPS_PORT:127.0.0.1" \
-H "Content-Type: application/dns-message" \
--data-binary @"$DOH_QUERY_FILE" \
--cacert "$CA" \
-o "$DOH_RESP_FILE" \
-w "%{http_code}" \
"https://numa.numa:$PROXY_HTTPS_PORT/dns-query")
check "DoH POST returns HTTP 200" "200" "$DOH_CODE"
# Check response contains IP 10.0.0.1 (hex: 0a000001)
DOH_HEX=$(xxd -p "$DOH_RESP_FILE" | tr -d '\n')
if echo "$DOH_HEX" | grep -q "0a000001"; then
check "DoH response resolves dot-test.example → 10.0.0.1" "found" "found"
else
check "DoH response resolves dot-test.example → 10.0.0.1" "0a000001" "$DOH_HEX"
fi
# Wrong Content-Type → 415
DOH_CT_CODE=$(curl -sk -X POST \
-H "Host: numa.numa" \
-H "Content-Type: text/plain" \
--data-binary @"$DOH_QUERY_FILE" \
-o /dev/null -w "%{http_code}" \
"https://127.0.0.1:$PROXY_HTTPS_PORT/dns-query")
check "DoH wrong Content-Type → 415" "415" "$DOH_CT_CODE"
# Wrong host → 404 (DoH only serves numa.numa)
DOH_HOST_CODE=$(curl -sk -X POST \
-H "Host: foo.numa" \
-H "Content-Type: application/dns-message" \
--data-binary @"$DOH_QUERY_FILE" \
-o /dev/null -w "%{http_code}" \
"https://127.0.0.1:$PROXY_HTTPS_PORT/dns-query")
check "DoH wrong host → 404" "404" "$DOH_HOST_CODE"
rm -f "$DOH_QUERY_FILE" "$DOH_RESP_FILE"
echo ""
echo "=== Proxy TLS works with DoT enabled ==="
@@ -640,6 +725,332 @@ CONF
rm -rf "$NUMA_DATA"
fi
fi # end Suite 6
# ---- Suite 7: filter_aaaa (IPv4-only networks) ----
if should_run_suite 7; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 7: filter_aaaa ║"
echo "╚══════════════════════════════════════════╝"
# Config A — filter on, with a local AAAA zone to prove local data bypass.
cat > "$CONFIG" << 'CONF'
[server]
bind_addr = "127.0.0.1:5354"
api_port = 5381
filter_aaaa = true
[upstream]
mode = "forward"
address = "9.9.9.9"
port = 53
[cache]
max_entries = 10000
[blocking]
enabled = false
[proxy]
enabled = false
[[zones]]
domain = "v6.test"
record_type = "AAAA"
value = "2001:db8::1"
ttl = 60
CONF
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
NUMA_PID=$!
sleep 3
DIG="dig @127.0.0.1 -p $PORT +time=5 +tries=1"
echo ""
echo "=== filter_aaaa = true ==="
# A queries must be untouched.
check "A record resolves under filter_aaaa" \
"." \
"$($DIG google.com A +short | head -1)"
# AAAA must be NOERROR (NODATA), not NXDOMAIN, not SERVFAIL.
check "AAAA returns NOERROR (not NXDOMAIN)" \
"status: NOERROR" \
"$($DIG google.com AAAA 2>&1 | grep 'status:')"
check "AAAA returns zero answers (NODATA shape)" \
"ANSWER: 0" \
"$($DIG google.com AAAA 2>&1 | grep -oE 'ANSWER: [0-9]+' | head -1)"
# Local zone AAAA must survive the filter (PR claim: local data bypasses).
check "Local [[zones]] AAAA bypasses filter" \
"2001:db8::1" \
"$($DIG v6.test AAAA +short)"
# HTTPS RR: ipv6hint (SvcParamKey 6) must be stripped. Query as `type65`
# because dig 9.10.6 (macOS) misparses `HTTPS` as a domain name; `type65`
# works on both 9.10.6 and 9.18. Assert on the raw rdata hex (RFC 3597
# generic format), since dig 9.10.6 doesn't pretty-print HTTPS params.
# cloudflare.com's ipv6hint values sit under the 2606:4700 prefix —
# checking that `26064700` is absent from the rdata hex is a precise,
# upstream-stable signal that the TLV was stripped.
HTTPS_OUT=$($DIG cloudflare.com type65 2>&1)
if echo "$HTTPS_OUT" | grep -qE "cloudflare\.com\..*IN[[:space:]]+TYPE65"; then
HTTPS_HEX=$(echo "$HTTPS_OUT" | grep -A5 "IN[[:space:]]*TYPE65" | tr -d " \t\n")
if echo "$HTTPS_HEX" | grep -qi "26064700"; then
check "HTTPS ipv6hint stripped (2606:4700 absent from rdata)" "absent" "present"
else
check "HTTPS ipv6hint stripped (2606:4700 absent from rdata)" "absent" "absent"
fi
else
# Upstream didn't return an HTTPS record — skip rather than false-pass.
printf " ${DIM}~ HTTPS ipv6hint stripped (skipped: no HTTPS RR returned by upstream)${RESET}\n"
fi
kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
# Config B — filter off. Regression guard: prove AAAA answers come back
# when the flag isn't set, so a network failure in Config A can't silently
# pass as "filter working".
cat > "$CONFIG" << 'CONF'
[server]
bind_addr = "127.0.0.1:5354"
api_port = 5381
[upstream]
mode = "forward"
address = "9.9.9.9"
port = 53
[cache]
max_entries = 10000
[blocking]
enabled = false
[proxy]
enabled = false
CONF
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
NUMA_PID=$!
sleep 3
echo ""
echo "=== filter_aaaa unset (regression guard) ==="
check "AAAA returns real answers with filter off" \
":" \
"$($DIG google.com AAAA +short | head -1)"
kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
fi # end Suite 7
# ---- Suite 8: ODoH (Oblivious DoH via public relay + target) ----
# Exercises the full client pipeline: /.well-known/odohconfigs fetch,
# HPKE seal/unseal, URL-query target routing (RFC 9230 §5), dashboard
# QueryPath::Odoh counter. Depends on the public ecosystem being up —
# the probe-odoh-ecosystem.sh script guards against flaky runs.
if should_run_suite 8; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 8: ODoH (Anonymous DNS) ║"
echo "╚══════════════════════════════════════════╝"
run_test_suite "ODoH via edgecompute.app relay → Cloudflare target" "
[server]
bind_addr = \"127.0.0.1:5354\"
api_port = 5381
[upstream]
mode = \"odoh\"
relay = \"https://odoh-relay.edgecompute.app/proxy\"
target = \"https://odoh.cloudflare-dns.com/dns-query\"
[cache]
max_entries = 10000
min_ttl = 60
max_ttl = 86400
[blocking]
enabled = false
[proxy]
enabled = false
"
# Re-start briefly to assert ODoH-specific observability: the odoh counter
# has to tick above zero after a query, and the stats label has to reflect
# the oblivious path. These guard against silent regressions in the
# QueryPath::Odoh tagging and the /stats serialisation.
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
NUMA_PID=$!
for _ in $(seq 1 30); do
curl -sf "http://127.0.0.1:$API_PORT/health" >/dev/null 2>&1 && break
sleep 0.1
done
$DIG example.com A +short > /dev/null 2>&1 || true
sleep 1
STATS=$(curl -sf http://127.0.0.1:$API_PORT/stats 2>/dev/null)
# upstream_transport.odoh lives inside the upstream_transport object.
ODOH_COUNT=$(echo "$STATS" | grep -o '"upstream_transport":{[^}]*}' \
| grep -o '"odoh":[0-9]*' | cut -d: -f2)
check "upstream_transport.odoh > 0 after a query" "[1-9]" "${ODOH_COUNT:-0}"
check "Upstream label advertises odoh://" \
"odoh://" \
"$(echo "$STATS" | grep -o '"upstream":"[^"]*"')"
check "Stats mode field is 'odoh'" \
'"mode":"odoh"' \
"$(echo "$STATS" | grep -o '"mode":"odoh"')"
# Strict-mode failure path: a clearly-unreachable relay must produce
# SERVFAIL without silent downgrade. We hijack the config to point at
# an .invalid host so we don't rely on external uptime.
kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
cat > "$CONFIG" << 'CONF'
[server]
bind_addr = "127.0.0.1:5354"
api_port = 5381
[upstream]
mode = "odoh"
relay = "https://relay.invalid/proxy"
target = "https://odoh.cloudflare-dns.com/dns-query"
strict = true
[cache]
max_entries = 10000
[blocking]
enabled = false
[proxy]
enabled = false
CONF
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
NUMA_PID=$!
for _ in $(seq 1 30); do
curl -sf "http://127.0.0.1:$API_PORT/health" >/dev/null 2>&1 && break
sleep 0.1
done
check "Strict-mode relay outage returns SERVFAIL" \
"SERVFAIL" \
"$($DIG example.com A 2>&1 | grep 'status:')"
kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
# Negative: relay and target on the same host must be rejected at startup.
cat > "$CONFIG" << 'CONF'
[server]
bind_addr = "127.0.0.1:5354"
api_port = 5381
[upstream]
mode = "odoh"
relay = "https://odoh.cloudflare-dns.com/proxy"
target = "https://odoh.cloudflare-dns.com/dns-query"
CONF
STARTUP_OUT=$("$BINARY" "$CONFIG" 2>&1 || true)
check "Same-host relay+target rejected at startup" \
"same host" \
"$STARTUP_OUT"
fi # end Suite 8
# ---- Suite 9: Numa's own ODoH relay (--relay-mode) ----
# Exercises `numa relay PORT` as a forwarding proxy to a real ODoH target.
# Validates the RFC 9230 §5 relay behaviour: URL-query routing, content-type
# gating, body-size cap, and /health observability.
if should_run_suite 9; then
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ Suite 9: Numa ODoH Relay (own) ║"
echo "╚══════════════════════════════════════════╝"
RELAY_PORT=18443
"$BINARY" relay $RELAY_PORT > "$LOG" 2>&1 &
NUMA_PID=$!
for _ in $(seq 1 30); do
curl -sf "http://127.0.0.1:$RELAY_PORT/health" >/dev/null 2>&1 && break
sleep 0.1
done
echo ""
echo "=== Relay Endpoints ==="
check "Health endpoint returns ok" \
"ok" \
"$(curl -sf http://127.0.0.1:$RELAY_PORT/health | head -1)"
# Happy path: forwards arbitrary body to Cloudflare's ODoH target. The
# target will reject the garbage envelope with HTTP 400 — which is exactly
# what proves our relay faithfully forwarded (otherwise we'd see our own
# 4xx from the relay itself).
HAPPY_STATUS=$(curl -sS -o /dev/null -w "%{http_code}" -X POST \
-H "Content-Type: application/oblivious-dns-message" \
--data-binary "garbage-forwarded-end-to-end" \
"http://127.0.0.1:$RELAY_PORT/relay?targethost=odoh.cloudflare-dns.com&targetpath=/dns-query")
check "Relay forwards to target (target rejects garbage → 400)" \
"400" \
"$HAPPY_STATUS"
echo ""
echo "=== Guards ==="
check "Missing content-type → 415" \
"415" \
"$(curl -sS -o /dev/null -w '%{http_code}' -X POST --data-binary 'x' \
'http://127.0.0.1:'$RELAY_PORT'/relay?targethost=odoh.cloudflare-dns.com&targetpath=/dns-query')"
check "Oversized body (>4 KiB) → 413" \
"413" \
"$(head -c 5000 /dev/urandom | curl -sS -o /dev/null -w '%{http_code}' -X POST \
-H 'Content-Type: application/oblivious-dns-message' --data-binary @- \
'http://127.0.0.1:'$RELAY_PORT'/relay?targethost=odoh.cloudflare-dns.com&targetpath=/dns-query')"
check "Invalid targethost (no dot) → 400" \
"400" \
"$(curl -sS -o /dev/null -w '%{http_code}' -X POST \
-H 'Content-Type: application/oblivious-dns-message' --data-binary 'x' \
'http://127.0.0.1:'$RELAY_PORT'/relay?targethost=invalid&targetpath=/dns-query')"
echo ""
echo "=== Counters ==="
HEALTH=$(curl -sf "http://127.0.0.1:$RELAY_PORT/health")
check "Relay counted at least one forwarded_ok" \
"[1-9]" \
"$(echo "$HEALTH" | grep 'forwarded_ok' | awk '{print $2}')"
check "Relay counted at least one rejected_bad_request" \
"[1-9]" \
"$(echo "$HEALTH" | grep 'rejected_bad_request' | awk '{print $2}')"
kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
sleep 1
fi # end Suite 9
# Summary
echo ""
TOTAL=$((PASSED + FAILED))

101
tests/probe-odoh-ecosystem.sh Executable file
View File

@@ -0,0 +1,101 @@
#!/usr/bin/env bash
# Probe the public ODoH ecosystem.
#
# Source of truth: DNSCrypt's curated list at
# https://github.com/DNSCrypt/dnscrypt-resolvers/tree/master/v3
# - v3/odoh-servers.md (ODoH targets)
# - v3/odoh-relays.md (ODoH relays)
#
# As of commit 2025-09-16 ("odohrelay-crypto-sx seems to be the only ODoH
# relay left"), the full public ecosystem is 4 targets + 1 relay. Re-run this
# script against the upstream list before making any "only N public relays"
# claim publicly.
#
# Usage: ./tests/probe-odoh-ecosystem.sh
set -uo pipefail
GREEN="\033[32m"
RED="\033[31m"
YELLOW="\033[33m"
DIM="\033[90m"
RESET="\033[0m"
UP=0
DOWN=0
probe_target() {
local name="$1"
local host="$2"
local url="https://${host}/.well-known/odohconfigs"
local start=$(date +%s%N)
local headers
headers=$(curl -sS -o /tmp/odoh-probe-body -D - --max-time 5 -A "numa-odoh-probe/0.1" "$url" 2>&1) || {
DOWN=$((DOWN + 1))
printf " ${RED}${RESET} %-25s ${DIM}unreachable${RESET}\n" "$name"
return
}
local elapsed_ms=$((($(date +%s%N) - start) / 1000000))
local status
status=$(echo "$headers" | head -1 | awk '{print $2}')
local ctype
ctype=$(echo "$headers" | grep -i '^content-type:' | head -1 | tr -d '\r')
local size
size=$(stat -f%z /tmp/odoh-probe-body 2>/dev/null || stat -c%s /tmp/odoh-probe-body 2>/dev/null || echo 0)
if [[ "$status" == "200" ]] && [[ "$size" -gt 0 ]]; then
UP=$((UP + 1))
printf " ${GREEN}${RESET} %-25s ${DIM}%4dms %s bytes %s${RESET}\n" "$name" "$elapsed_ms" "$size" "$ctype"
else
DOWN=$((DOWN + 1))
printf " ${RED}${RESET} %-25s ${DIM}status=%s size=%s${RESET}\n" "$name" "$status" "$size"
fi
rm -f /tmp/odoh-probe-body
}
probe_relay() {
# Relays don't expose /.well-known/odohconfigs — we just verify TLS reachability
# and that the endpoint responds to a malformed POST with an HTTP error
# (indicating the relay path exists). A real ODoH validation requires HPKE.
local name="$1"
local url="$2"
local start=$(date +%s%N)
local status
status=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 5 -A "numa-odoh-probe/0.1" \
-X POST -H "Content-Type: application/oblivious-dns-message" \
--data-binary "" "$url" 2>&1) || {
DOWN=$((DOWN + 1))
printf " ${RED}${RESET} %-25s ${DIM}unreachable${RESET}\n" "$name"
return
}
local elapsed_ms=$((($(date +%s%N) - start) / 1000000))
# Any 2xx or 4xx means the endpoint is live (TLS works, HTTP responded).
# 5xx or 000 (curl failure) means broken.
if [[ "$status" =~ ^[24] ]]; then
UP=$((UP + 1))
printf " ${GREEN}${RESET} %-25s ${DIM}%4dms status=%s (endpoint live)${RESET}\n" "$name" "$elapsed_ms" "$status"
else
DOWN=$((DOWN + 1))
printf " ${RED}${RESET} %-25s ${DIM}status=%s${RESET}\n" "$name" "$status"
fi
}
echo "ODoH targets:"
probe_target "Cloudflare" "odoh.cloudflare-dns.com"
probe_target "crypto.sx" "odoh.crypto.sx"
probe_target "Snowstorm" "dope.snowstorm.love"
probe_target "Tiarap" "doh.tiarap.org"
echo
echo "ODoH relays:"
probe_relay "Frank Denis (Fastly)" "https://odoh-relay.edgecompute.app/proxy"
echo
TOTAL=$((UP + DOWN))
if [[ "$DOWN" -eq 0 ]]; then
printf "${GREEN}All %d endpoints up${RESET}\n" "$TOTAL"
exit 0
else
printf "${YELLOW}%d/%d up, %d down${RESET}\n" "$UP" "$TOTAL" "$DOWN"
exit 1
fi