- Hoist ODOH_CONTENT_TYPE to a single pub(crate) constant in odoh.rs;
relay.rs imports it instead of declaring its own.
- Generalize dashboard encryptionPct(data, encryptedKeys, allKeys)
so both Inbound Wire and Outbound Wire panels share the same math
instead of drifting independently.
- Extract RelayState::new() and build_app() helpers in relay.rs so
the test spawn_relay() and production run() wire the same router
+ body-limit layer. Prevents future middleware from landing in one
path but not the other.
All 344 lib tests pass; no behavior change.
Two-container deploy: Caddy terminates TLS (auto-provisions Let's
Encrypt via ACME) and reverse-proxies to a Numa relay on an internal
Docker network. The relay never reads sealed payloads; Caddy's
access log is discarded so per-request observability doesn't defeat
the oblivious property.
Validated against Hetzner CX22 + DNS at odoh-relay.numa.rs:
- TLS-ALPN-01 challenge succeeded on first attempt
- /health returned the relay's counter block
- End-to-end ODoH client → relay → Cloudflare works
Operators only need to: set a DNS A record, edit Caddyfile's hostname,
docker compose up -d. README walks through the steps and the DNSCrypt
v3/odoh-relays.md submission to claim a public listing.
- `numa relay [PORT] [BIND]` accepts an optional bind address (defaults
to 127.0.0.1, matching the Caddy reverse-proxy deployment shape).
Required for Docker, where the relay needs 0.0.0.0 inside the
container so Caddy can reach it across the bridge network.
- Dashboard now surfaces the upstream_transport dimension as an
"Outbound Wire" panel alongside the existing "Inbound Wire" (renamed
from "Transport" for directional clarity). Sub-headers — "apps → numa"
/ "numa → internet" — make the threat-model split obvious without
jargon. Bars: UDP/DoH/DoT/ODoH, headline "X% encrypted outbound".
The PR description's promise that "the dashboard answers how much of
my DNS traffic left in cleartext honestly" is now true.
Two issues surfaced from running mode = "odoh" against the live Hetzner
relay as system DNS:
1. **Bootstrap deadlock.** The reqwest HTTPS client resolves the relay
and target hostnames via system DNS. When numa is itself the system
resolver, the ODoH client loops trying to resolve through itself.
Adds optional `relay_ip` and `target_ip` to `[upstream]`, plumbed
into reqwest's `resolve()` so the HTTPS client bypasses system DNS
for those two hostnames. TLS still validates against the URL
hostname, so a stale IP fails loudly rather than silently MITM'ing.
2. **2x relay load.** Default `hedge_ms = 10` triggers a duplicate
in-flight query for every request. Useful for UDP/DoH/DoT (rescues
tail latency cheaply); wasteful for ODoH (doubles HPKE seal/unseal,
doubles sealed-byte footprint a passive observer can correlate, no
latency win — relay hop dominates either way). Force-zero in
oblivious mode regardless of configured hedge_ms.
Validated end-to-end against odoh-relay.numa.rs → Cloudflare:
3 digs produced 3 forwarded_ok on the relay (was 6 before the hedge
fix), upstream_transport.odoh ticks correctly.
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.
Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.
Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.
Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
Adding a record type used to require 5 edits across the file (enum
variant, to_num, from_num, as_str, parse_str). The macro takes a
single (variant, num, str) tuple per type and generates the enum
plus all four methods.
UNKNOWN(u16) stays hand-coded since it carries data and can't sit
in the table.
src/question.rs: 156 lines -> 92 lines, no behavior change.
Logs were printing UNKNOWN(64), UNKNOWN(29), UNKNOWN(35) for SVCB,
LOC, and NAPTR — three RR types that have been registered for years
and show up in the wild (notably SVCB via RFC 9462 DDR clients
querying _dns.resolver.arpa).
Adds the variants and replaces the SVCB_QTYPE u16 const introduced
in #119 with QueryType::SVCB.to_num(), matching the HTTPS path.
Closes#114.
HTTPS (65) and SVCB (64) share the RDATA wire format, so the existing
parser already handles both — only the call site was HTTPS-only. Widen
the qtype check and extend the existing pipeline test with a second
query for SVCB.
- Drop `const HTTPS_TYPE: u16 = 65;` in favor of `QueryType::HTTPS.to_num()`
at the single call site — avoids a fresh magic number alongside the
existing enum mapping in question.rs.
- Add `DnsPacket::for_each_record_mut` so `strip_https_ipv6_hints` stops
hand-rolling the answers/authorities/resources walk; future section
rewrites go through the same helper.
- Promote the SVCB test-rdata builder from `svcb::tests` to module scope
as `pub(crate) #[cfg(test)] fn build_rdata`, and reuse it in the two
pipeline tests in ctx.rs — kills ~20 lines of byte-fiddling and keeps
one RDATA-construction code path.
Modifying HTTPS rdata invalidates any accompanying RRSIG, so a DNSSEC-
validating downstream would reject the response as Bogus. Gate the
strip on !client_do, matching the existing DNSSEC-records strip.
Adds a regression test that catches the gate being removed: builds a
query with EDNS DO=1, asserts the HTTPS rdata round-trips untouched.
Suite 7 exercises the full pipeline end-to-end: A resolves, AAAA returns
NODATA, local [[zones]] AAAA bypasses the filter, and HTTPS ipv6hint is
stripped from a real cloudflare.com response. A second config run with
the flag unset guards against network-failure false-positives.
SUITES=N (comma list) runs a subset, e.g. `SUITES=7 bash tests/integration.sh`
skips suites 1-6 for fast iteration.
Three scenarios CI cannot run: every advertised port is functional (DNS
resolves, TLS chain validates against numa's CA, HTTP/API respond), CA
fingerprint survives upgrade from pre-drop layout, binary staging
fallback from a 0700 source dir. Self-bootstraps a privileged
systemd-as-PID1 container — no dependency on long-lived test containers.
MainPID user assertion retries until comm=numa to avoid a race where
systemctl reports active while MainPID still points at a transitional
process.
When enabled, AAAA queries short-circuit to NODATA (NOERROR + empty
answer) so Happy Eyeballs clients don't stall waiting on a v6 address
they can't use. Also strips `ipv6hint` SvcParam from HTTPS/SVCB
answers (RFC 9460) so Chrome ≥103, Firefox, and Safari don't bypass
the AAAA filter via the HTTPS record path.
Local data is preserved: overrides, zones, the .numa proxy, and the
blocklist sinkhole keep whatever v6 addresses they configure — the
filter only kicks in on the cache/forward/recursive path. NODATA is
correct per RFC 2308 here; NXDOMAIN would incorrectly imply the name
doesn't exist for A queries either.
Off by default. Opt in via `filter_aaaa = true` under `[server]`.
Re-install failed with ETXTBSY (Text file busy) because std::fs::copy
can't overwrite a binary that's currently being executed by the
running service. Switch to copy-then-rename: write the new binary to
/usr/local/bin/numa.new, then rename over /usr/local/bin/numa. Rename
swaps the path while the running process keeps the old inode alive,
so DNS keeps serving from the previous binary until restart.
Bump systemctl start to restart so the new binary actually loads on
re-install (start is a no-op when the unit is already active, which
would silently leave the old binary running).
Locally verified the full CI sequence: install → curl → reinstall →
curl → uninstall → curl-fails. All three assertions pass.
Linux install_service_linux now does the {{exe_path}} substitution
inline because it uses the (potentially copied) binary path returned
by install_service_binary_linux, not current_exe(). The shared
replace_exe_path helper is dead on Linux — clippy -D warnings caught it.
Narrow the function to macos and split the placeholder test: keep the
"both templates contain {{exe_path}}" assertion as a cross-platform test
(catches placeholder removal on either file), keep the substitution test
gated to macos where the function lives.
DynamicUser=yes' transient account can only traverse world-x directories.
The CI binary at /home/runner/work/numa/numa/target/release/numa fails
exec with EACCES because /home/runner is mode 0700; same applies to a
build under /home/<user>/, ~/.cargo/bin, or any private $HOME tree.
install_service_binary_linux now walks the binary's path. If every
ancestor grants world-execute (Linuxbrew /home/linuxbrew is 0755,
/usr/local/bin is fine, install.sh layout works), keep the source
path so brew/distro upgrades propagate in place. Otherwise copy to
/usr/local/bin/numa and reference that in the unit.
Locally verified both branches in an Ubuntu 24.04 systemd container:
- CI-like /home/runner (0700) → copies + service binds 5380
- Brew-like /home/linuxbrew (0755) → keeps source path + service binds 5380
Integration-linux journalctl showed status=203/EXEC: systemd couldn't
exec /home/runner/work/numa/numa/target/release/numa because
ProtectHome=yes makes /home invisible to the sandboxed process. My
local Docker test passed because the binary was at /workspace, not
/home.
DynamicUser=yes already implies ProtectHome=read-only, which preserves
exec access to binaries living under /home (cargo install, source
builds, CI) while blocking writes to user $HOMEs. Keep that default
rather than over-restricting.
Follow-up worth tracking: install_service_linux could copy the binary
to /usr/local/bin/numa the way Windows does at windows_service_exe_path,
making the unit's ExecStart independent of where `numa install` was
invoked from — then we could set ProtectHome=yes again.
AUR installs never call `numa install` — PKGBUILD drops the unit straight
into /usr/lib/systemd/system and the user runs `systemctl enable numa`.
With User=numa the Rust installer's useradd code never fires there,
breaking Arch out of the box.
DynamicUser=yes sidesteps packaging entirely — systemd allocates a
transient UID per start and remaps StateDirectory ownership (including
legacy root-owned trees) automatically. Works on any modern systemd.
Drops the ensure_numa_user_linux/chown helpers plus NUMA_USER; the
unit file alone now captures the privilege-drop story.
Integration test failed with exit 7 on curl to /health after a successful
install — service started but never listened. The likely culprits are
MemoryDenyWriteExecute (breaks jemalloc/some crypto), SystemCallFilter
~@privileged @resources (blocks setrlimit and friends tokio may use),
and RestrictNamespaces/LockPersonality (occasional foot-guns).
Pull them and keep a conservative hardening set that's well-tested with
Rust network services: no-new-privs, protect-system/home, private tmp
and devices, protect-kernel-*, restrict-realtime/suid/address-families.
Layer the aggressive bits back in follow-up PRs once tested individually.
- numa.service: User=numa + CAP_NET_BIND_SERVICE + sandboxing block
(ProtectSystem=strict, PrivateTmp, seccomp @system-service, etc)
- install_service_linux: create numa system user + chown data_dir
before first start so TLS-cert generation and state writes land
on a numa-owned tree
Runtime verified root-free on Linux — network_watch_loop only reads
/etc/resolv.conf; all system-DNS mutation stays in the installer,
which continues to run as root via sudo.
The SRTT ordering + failure penalty path was UDP-only, so a DoT primary
in a forwarding-rule pool was never deprioritized on failure and all
DoT entries tied at INITIAL_SRTT_MS in the sort key. With [[forwarding]]
now accepting arrays of upstreams, DoT pools are a first-class case and
need the same healthiest-first behavior the default pool gets for UDP.
- Add Upstream::tracked_ip() → Some(ip) for Udp/Dot, None for Doh
(DoH has no stable IP — reqwest pools connections by hostname).
- Rewire the three SRTT call sites in forward_with_failover_raw.
- Hoist srtt.read() out of the candidate-scoring loop — one lock per
query instead of N (matters now that pools commonly have N>1).
- Drop unused #[derive(Debug)] on UpstreamPool and ForwardingRule.
- Regression tests: udp_failure_records_in_srtt + dot_failure_records_in_srtt.
Resolves src/main.rs conflict: serve loop was extracted into src/serve.rs on main (PR #107). Ported the forwarding-rule log change to serve.rs — fwd.upstream is now Vec<String>, logged with join(", ").
systemd-resolved has a ~40s reconfiguration stall after restart
(systemd #22521) that breaks the GHA runner's persistent connection
to results-receiver.actions.githubusercontent.com. Polling for DNS
recovery isn't enough since the .NET runner agent caches DNS at the
connection-pool level. Replace the broken stub-resolv symlink with a
direct upstream so DNS works instantly.
Move DNS recovery wait into the cleanup step (if: always) so it runs
regardless of test outcome. Use getent hosts loop instead of sleep+dig
to match what post-steps actually use for resolution.
systemd-resolved needs a moment to restore its stub listener after
the numa drop-in is removed. Without a wait, the runner can't resolve
GitHub's API to report job completion.
Release-build + install/verify/re-install/uninstall cycle on Linux and
macOS. Runs after lint/test passes (needs dependency). Cleanup step
uses if: always() to handle cancellation.
Extract parse_sc_registered and parse_sc_state as testable pure
functions. 8 new tests covering: service registration detection,
service state parsing, and Windows config_dir == data_dir invariant.
Stop the running service before disabling Dnscache so the port 53 probe
sees the real state (not Numa's own binding). Wait for SCM STOPPED
state before copying the binary to avoid os error 32 (file in use).
config_dir() on Windows now returns data_dir() (ProgramData) so config,
services.json, and log file are in the same place for both interactive
and service contexts. Service mode writes logs to numa.log via
env_logger pipe. Dashboard shows correct log path per OS.
Probe port 53 after disabling Dnscache instead of assuming reboot is
needed. Skip DNS redirect when port is blocked (service does it on
first boot). Fix readiness probe: TCP connect to API port instead of
broken UDP send_to that always succeeded.
service start/stop/restart/status now map to proper SCM operations
instead of re-running the full install/uninstall flow. On re-install,
stop the running service first so the binary can be overwritten.
The polling refresh replaced the entire allowlist panel innerHTML every
2 seconds, destroying the input field mid-typing. Users had to
paste-and-enter faster than the refresh interval — #106 reported this
as text "timing out and erasing."
Guard: skip renderAllowlist() when allowDomainInput has focus.
Switch to --long flag so format is always TAG-N-gSHA[-dirty], then
split from the right. Handles pre-release tags (v0.14.0-rc1) that
broke the previous left-split approach. Remove ineffective directory
watch on .git/refs/tags/. Trim comments.
Adds a build.rs that runs `git describe --tags --always --dirty` and
sets NUMA_BUILD_VERSION at compile time. A new `numa::version()` helper
returns the build version, falling back to CARGO_PKG_VERSION when git
is unavailable (source tarballs, Docker builds without .git).
Version strings:
tagged release: 0.13.1
commits ahead: 0.13.1+a87f907
uncommitted changes: 0.13.1+a87f907-dirty
no git: 0.13.1
Replaces all 6 inline env!("CARGO_PKG_VERSION") call sites with the
single version() function.