Plain host-string equality caught the copy-paste-same-URL footgun but
let `r.cloudflare.com` + `odoh.cloudflare.com` through — two subdomains
of the same operator collapse ODoH to ordinary DoH. Add a second layer:
compare registrable domains via the PSL (`psl` crate) after the exact-
host check. Fails open on IP literals and unparseable hosts; the exact-
host check still runs in those cases.
- Hoist ODOH_CONTENT_TYPE to a single pub(crate) constant in odoh.rs;
relay.rs imports it instead of declaring its own.
- Generalize dashboard encryptionPct(data, encryptedKeys, allKeys)
so both Inbound Wire and Outbound Wire panels share the same math
instead of drifting independently.
- Extract RelayState::new() and build_app() helpers in relay.rs so
the test spawn_relay() and production run() wire the same router
+ body-limit layer. Prevents future middleware from landing in one
path but not the other.
All 344 lib tests pass; no behavior change.
- `numa relay [PORT] [BIND]` accepts an optional bind address (defaults
to 127.0.0.1, matching the Caddy reverse-proxy deployment shape).
Required for Docker, where the relay needs 0.0.0.0 inside the
container so Caddy can reach it across the bridge network.
- Dashboard now surfaces the upstream_transport dimension as an
"Outbound Wire" panel alongside the existing "Inbound Wire" (renamed
from "Transport" for directional clarity). Sub-headers — "apps → numa"
/ "numa → internet" — make the threat-model split obvious without
jargon. Bars: UDP/DoH/DoT/ODoH, headline "X% encrypted outbound".
The PR description's promise that "the dashboard answers how much of
my DNS traffic left in cleartext honestly" is now true.
Two issues surfaced from running mode = "odoh" against the live Hetzner
relay as system DNS:
1. **Bootstrap deadlock.** The reqwest HTTPS client resolves the relay
and target hostnames via system DNS. When numa is itself the system
resolver, the ODoH client loops trying to resolve through itself.
Adds optional `relay_ip` and `target_ip` to `[upstream]`, plumbed
into reqwest's `resolve()` so the HTTPS client bypasses system DNS
for those two hostnames. TLS still validates against the URL
hostname, so a stale IP fails loudly rather than silently MITM'ing.
2. **2x relay load.** Default `hedge_ms = 10` triggers a duplicate
in-flight query for every request. Useful for UDP/DoH/DoT (rescues
tail latency cheaply); wasteful for ODoH (doubles HPKE seal/unseal,
doubles sealed-byte footprint a passive observer can correlate, no
latency win — relay hop dominates either way). Force-zero in
oblivious mode regardless of configured hedge_ms.
Validated end-to-end against odoh-relay.numa.rs → Cloudflare:
3 digs produced 3 forwarded_ok on the relay (was 6 before the hedge
fix), upstream_transport.odoh ticks correctly.
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.
Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.
Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.
Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
Adding a record type used to require 5 edits across the file (enum
variant, to_num, from_num, as_str, parse_str). The macro takes a
single (variant, num, str) tuple per type and generates the enum
plus all four methods.
UNKNOWN(u16) stays hand-coded since it carries data and can't sit
in the table.
src/question.rs: 156 lines -> 92 lines, no behavior change.
Logs were printing UNKNOWN(64), UNKNOWN(29), UNKNOWN(35) for SVCB,
LOC, and NAPTR — three RR types that have been registered for years
and show up in the wild (notably SVCB via RFC 9462 DDR clients
querying _dns.resolver.arpa).
Adds the variants and replaces the SVCB_QTYPE u16 const introduced
in #119 with QueryType::SVCB.to_num(), matching the HTTPS path.
Closes#114.
HTTPS (65) and SVCB (64) share the RDATA wire format, so the existing
parser already handles both — only the call site was HTTPS-only. Widen
the qtype check and extend the existing pipeline test with a second
query for SVCB.
- Drop `const HTTPS_TYPE: u16 = 65;` in favor of `QueryType::HTTPS.to_num()`
at the single call site — avoids a fresh magic number alongside the
existing enum mapping in question.rs.
- Add `DnsPacket::for_each_record_mut` so `strip_https_ipv6_hints` stops
hand-rolling the answers/authorities/resources walk; future section
rewrites go through the same helper.
- Promote the SVCB test-rdata builder from `svcb::tests` to module scope
as `pub(crate) #[cfg(test)] fn build_rdata`, and reuse it in the two
pipeline tests in ctx.rs — kills ~20 lines of byte-fiddling and keeps
one RDATA-construction code path.
Modifying HTTPS rdata invalidates any accompanying RRSIG, so a DNSSEC-
validating downstream would reject the response as Bogus. Gate the
strip on !client_do, matching the existing DNSSEC-records strip.
Adds a regression test that catches the gate being removed: builds a
query with EDNS DO=1, asserts the HTTPS rdata round-trips untouched.
When enabled, AAAA queries short-circuit to NODATA (NOERROR + empty
answer) so Happy Eyeballs clients don't stall waiting on a v6 address
they can't use. Also strips `ipv6hint` SvcParam from HTTPS/SVCB
answers (RFC 9460) so Chrome ≥103, Firefox, and Safari don't bypass
the AAAA filter via the HTTPS record path.
Local data is preserved: overrides, zones, the .numa proxy, and the
blocklist sinkhole keep whatever v6 addresses they configure — the
filter only kicks in on the cache/forward/recursive path. NODATA is
correct per RFC 2308 here; NXDOMAIN would incorrectly imply the name
doesn't exist for A queries either.
Off by default. Opt in via `filter_aaaa = true` under `[server]`.
Re-install failed with ETXTBSY (Text file busy) because std::fs::copy
can't overwrite a binary that's currently being executed by the
running service. Switch to copy-then-rename: write the new binary to
/usr/local/bin/numa.new, then rename over /usr/local/bin/numa. Rename
swaps the path while the running process keeps the old inode alive,
so DNS keeps serving from the previous binary until restart.
Bump systemctl start to restart so the new binary actually loads on
re-install (start is a no-op when the unit is already active, which
would silently leave the old binary running).
Locally verified the full CI sequence: install → curl → reinstall →
curl → uninstall → curl-fails. All three assertions pass.
Linux install_service_linux now does the {{exe_path}} substitution
inline because it uses the (potentially copied) binary path returned
by install_service_binary_linux, not current_exe(). The shared
replace_exe_path helper is dead on Linux — clippy -D warnings caught it.
Narrow the function to macos and split the placeholder test: keep the
"both templates contain {{exe_path}}" assertion as a cross-platform test
(catches placeholder removal on either file), keep the substitution test
gated to macos where the function lives.
DynamicUser=yes' transient account can only traverse world-x directories.
The CI binary at /home/runner/work/numa/numa/target/release/numa fails
exec with EACCES because /home/runner is mode 0700; same applies to a
build under /home/<user>/, ~/.cargo/bin, or any private $HOME tree.
install_service_binary_linux now walks the binary's path. If every
ancestor grants world-execute (Linuxbrew /home/linuxbrew is 0755,
/usr/local/bin is fine, install.sh layout works), keep the source
path so brew/distro upgrades propagate in place. Otherwise copy to
/usr/local/bin/numa and reference that in the unit.
Locally verified both branches in an Ubuntu 24.04 systemd container:
- CI-like /home/runner (0700) → copies + service binds 5380
- Brew-like /home/linuxbrew (0755) → keeps source path + service binds 5380
AUR installs never call `numa install` — PKGBUILD drops the unit straight
into /usr/lib/systemd/system and the user runs `systemctl enable numa`.
With User=numa the Rust installer's useradd code never fires there,
breaking Arch out of the box.
DynamicUser=yes sidesteps packaging entirely — systemd allocates a
transient UID per start and remaps StateDirectory ownership (including
legacy root-owned trees) automatically. Works on any modern systemd.
Drops the ensure_numa_user_linux/chown helpers plus NUMA_USER; the
unit file alone now captures the privilege-drop story.
- numa.service: User=numa + CAP_NET_BIND_SERVICE + sandboxing block
(ProtectSystem=strict, PrivateTmp, seccomp @system-service, etc)
- install_service_linux: create numa system user + chown data_dir
before first start so TLS-cert generation and state writes land
on a numa-owned tree
Runtime verified root-free on Linux — network_watch_loop only reads
/etc/resolv.conf; all system-DNS mutation stays in the installer,
which continues to run as root via sudo.
The SRTT ordering + failure penalty path was UDP-only, so a DoT primary
in a forwarding-rule pool was never deprioritized on failure and all
DoT entries tied at INITIAL_SRTT_MS in the sort key. With [[forwarding]]
now accepting arrays of upstreams, DoT pools are a first-class case and
need the same healthiest-first behavior the default pool gets for UDP.
- Add Upstream::tracked_ip() → Some(ip) for Udp/Dot, None for Doh
(DoH has no stable IP — reqwest pools connections by hostname).
- Rewire the three SRTT call sites in forward_with_failover_raw.
- Hoist srtt.read() out of the candidate-scoring loop — one lock per
query instead of N (matters now that pools commonly have N>1).
- Drop unused #[derive(Debug)] on UpstreamPool and ForwardingRule.
- Regression tests: udp_failure_records_in_srtt + dot_failure_records_in_srtt.
Resolves src/main.rs conflict: serve loop was extracted into src/serve.rs on main (PR #107). Ported the forwarding-rule log change to serve.rs — fwd.upstream is now Vec<String>, logged with join(", ").
Extract parse_sc_registered and parse_sc_state as testable pure
functions. 8 new tests covering: service registration detection,
service state parsing, and Windows config_dir == data_dir invariant.
Stop the running service before disabling Dnscache so the port 53 probe
sees the real state (not Numa's own binding). Wait for SCM STOPPED
state before copying the binary to avoid os error 32 (file in use).
config_dir() on Windows now returns data_dir() (ProgramData) so config,
services.json, and log file are in the same place for both interactive
and service contexts. Service mode writes logs to numa.log via
env_logger pipe. Dashboard shows correct log path per OS.
Probe port 53 after disabling Dnscache instead of assuming reboot is
needed. Skip DNS redirect when port is blocked (service does it on
first boot). Fix readiness probe: TCP connect to API port instead of
broken UDP send_to that always succeeded.
service start/stop/restart/status now map to proper SCM operations
instead of re-running the full install/uninstall flow. On re-install,
stop the running service first so the binary can be overwritten.
Adds a build.rs that runs `git describe --tags --always --dirty` and
sets NUMA_BUILD_VERSION at compile time. A new `numa::version()` helper
returns the build version, falling back to CARGO_PKG_VERSION when git
is unavailable (source tarballs, Docker builds without .git).
Version strings:
tagged release: 0.13.1
commits ahead: 0.13.1+a87f907
uncommitted changes: 0.13.1+a87f907-dirty
no git: 0.13.1
Replaces all 6 inline env!("CARGO_PKG_VERSION") call sites with the
single version() function.
Closes#108.
- Add `version` field to /stats (from CARGO_PKG_VERSION).
- Show `v0.13.1` next to the Numa wordmark in the dashboard header.
- Restructure the footer into two semantic rows:
Row 1 (paths): Config · Data · Logs (platform-detected)
Row 2 (runtime): Upstream · DNSSEC · SRTT · GitHub
- Drop Mode from the footer (redundant with Upstream label).
- Show only the matching-platform log path instead of both
macOS and Linux unconditionally.
- Drop the duplicate WINDOWS_SERVICE_NAME constant; call sites use the
single source of truth at windows_service::SERVICE_NAME.
- windows_service_exe_path and service_config_path now compose from
crate::data_dir() instead of re-parsing %PROGRAMDATA% locally.
- Factor the 6× sc.exe invocation boilerplate into a run_sc helper.
- Replace the 200ms try_recv polling loop in the service dispatcher
with a recv_timeout wait — cuts shutdown latency and idle CPU.
- stop_service_scm/delete_service_scm now log warnings instead of
silently swallowing failures, so unexpected errors are visible.
Hooks the service-dispatcher scaffolding from the previous commit to
actually serve DNS, and replaces the HKLM\…\Run login-time autostart
with a proper Windows service created via sc.exe.
**Refactor**
- Extract main.rs's inline server body (~500 lines) into `numa::serve::run`
so both the interactive CLI entry and the service dispatcher drive the
same startup/serve loop. main.rs is now a thin subcommand router.
- main.rs goes sync (no #[tokio::main]); each branch that needs async
builds its own runtime and block_on's. Required so the --service path
can hand off to SCM without fighting tokio for the entry thread.
**Windows service wrapper**
- `numa::windows_service::run_service` now builds a multi-thread tokio
runtime on a dedicated thread and runs `serve::run` inside it. Stop/
Shutdown from SCM aborts the wait loop and reports SERVICE_STOPPED.
- Config path resolves to `%PROGRAMDATA%\numa\numa.toml` when running
under SCM (SYSTEM's cwd is System32, relative paths don't work).
**Install/uninstall**
- `install_windows` now copies numa.exe to a stable
`%PROGRAMDATA%\numa\bin\numa.exe` and registers it via `sc create`
with start=auto, obj=LocalSystem, and a failure policy of
restart/5000/restart/5000/restart/10000. Starts the service
immediately when no reboot is pending.
- `uninstall_windows` stops + deletes the service and removes the
binary copy before restoring DNS.
- Drops the old `register_autostart` / `remove_autostart` helpers that
wrote to `HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run` — that
path runs at user login in the user's session with no stderr capture
and no crash-restart policy, which is why we've been flying blind in
every Windows debug session.
DNS-set bugs (netsh destructive static, IPv6 not touched, uninstall
secondary-drop) and file logging are orthogonal — tracked for follow-up.
Lets numa.exe act as a real Windows service registered with the SCM,
replacing the HKLM\...\Run login-time autostart that runs in the user
session without stderr capture.
- New `numa::windows_service` module (cfg(windows)) wraps Mullvad's
`windows-service` crate: registers with SCM, reports Running, handles
Stop/Shutdown, reports Stopped.
- `numa.exe --service` is the entry point SCM uses
(`sc create … binPath="numa.exe --service"`); interactive invocations
are unchanged.
- Dep is gated `[target.'cfg(windows)'.dependencies]` — zero impact on
macOS/Linux builds or binary size.
Scaffold only. The service currently blocks on an mpsc channel until
Stop arrives; the actual serve loop will hook in once main.rs's inline
server body is extracted into `numa::serve(config_path)` in a follow-up.
This lets `sc start Numa` / `sc stop Numa` be verified end to end today.
Resolves conflict in src/ctx.rs — both sides added independent tokio
tests (forwarding fail-over on this branch, default-pool upstream path
on main from #103). Keep both.
Mirrors `[upstream] address` — `upstream` accepts string or array
of strings, builds an `UpstreamPool` and routes queries through
`forward_with_failover_raw` so SRTT ordering and failover apply to
matched `[[forwarding]]` rules the same way they do for the default
pool.
Single-string rules keep their current behavior (one-element pool,
equivalent single-upstream path). Empty array errors at config load.
Addresses item 1 of issue #102. Plan: docs/102_item1.md.
Queries matching a [[forwarding]] suffix rule now log as FORWARD;
queries resolved via the default [upstream] pool log as UPSTREAM.
Previously both paths shared the FORWARD label, making it impossible
to tell from logs whether a rule matched.
Adds QueryPath::Upstream, a queries.upstream stats counter exposed
via /stats, plus a matching dashboard filter, bar, and path tag.
Closes part of #102.
Config-level forwarding rules were parsed with the UDP-only
`parse_upstream_addr` helper, silently rejecting the DoT/DoH schemes
that the rest of the forwarding pipeline already supports.
Widen `ForwardingRule.upstream` from `SocketAddr` to `Upstream` so
config rules reuse the same parser as `[upstream].address` and
`fallback`. Demote `parse_upstream_addr` to `pub(crate)` to prevent
the same mistake recurring.
Closes#100.
Test each pipeline stage in isolation through resolve_query:
- override takes precedence over all other paths
- localhost and *.localhost resolve to loopback
- local zone returns configured records
- .tld proxy resolves registered services to loopback
- blocklist sinkholes to 0.0.0.0
- cache hit returns stored response without upstream
resolve_query now returns (BytePacketBuffer, QueryPath) so callers
and tests can inspect the resolution path without reading the query
log. Production call sites (UDP, DoT, DoH) destructure and ignore it.
The forwarding test now uses a mock UDP upstream that replies with a
canned response, asserting QueryPath::Forwarded instead of != Local.
Explicit [[forwarding]] rules now take precedence over the RFC 6303
special-use domain intercept. Previously, PTR queries for private
ranges (e.g. 168.192.in-addr.arpa) always returned local NXDOMAIN
even when a forwarding rule pointed them at a corporate DNS server.
Add full-pipeline resolve_query test harness (test_ctx + resolve_in_test)
and two tests covering both the default behavior and the override.
Closes#94