- Switch overrides from HashMap to BTreeMap — deterministic iteration by
type, drops the manual sort when logging.
- Rename the flat_map closure's inner `ips` to `addrs` to stop shadowing
the outer Vec<String>.
- Trim the Suite 8 TEST-NET-1 comment to keep the "why" and drop
mechanism narration.
- Drop a redundant sleep 1 after wait — wait already blocks on exit.
Suite 8 now ends with a config using RFC 5737 TEST-NET-1 IPs as
relay_ip/target_ip, started briefly so the bootstrap resolver logs its
override map. Asserts both host=IP pairs land in that map — closing the
gap flagged on PR #126 (zero-plain-DNS-leak for ODoH endpoints was only
unit-tested).
Also: NumaResolver::new now logs the override map at INFO when non-empty,
so operators can verify their ODoH bootstrap without needing DEBUG level.
When numa is its own system DNS resolver (HAOS add-on, Pi-hole-style
container, /etc/resolv.conf → 127.0.0.1), every numa-originated HTTPS
connection — DoH upstream, ODoH relay/target, blocklist CDN — routed
its hostname through getaddrinfo() back to numa itself. Cold boot
deadlocked; steady state taxed every new TCP connection. 0.14.1's
retry-with-backoff masked the startup race but not the underlying
self-loop.
NumaResolver implements reqwest::dns::Resolve with two lanes:
- Per-host overrides (ODoH relay_ip/target_ip) short-circuit DNS
entirely, preserving ODoH's zero-plain-DNS-leak property.
- Otherwise: A+AAAA in parallel via UDP to IP-literal bootstrap
servers, with TCP fallback for UDP-hostile networks.
Bootstrap IPs come from upstream.fallback (IP-literal filtered,
hostnames skipped with a warning). Empty fallback yields the
hardcoded default [9.9.9.9, 1.1.1.1]; the chosen source is logged
at startup so the silent default is visible.
doh_keepalive_loop now fires its first tick immediately, and
keepalive_doh logs failures at WARN — bootstrap issues surface
within ~100ms of boot instead of on the first client query.
Distinct from UpstreamPool.fallback (client-query failover) which
stays untouched: client queries with no configured fallback still
SERVFAIL on primary failure rather than silently shadow-routing.
Reproducer: tests/docker/self-resolver-loop.sh. Before: 0 blocklist
domains, 3072ms SERVFAIL. After: 397k domains, 118ms NOERROR.
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.
Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.
Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.
Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
Suite 7 exercises the full pipeline end-to-end: A resolves, AAAA returns
NODATA, local [[zones]] AAAA bypasses the filter, and HTTPS ipv6hint is
stripped from a real cloudflare.com response. A second config run with
the flag unset guards against network-failure false-positives.
SUITES=N (comma list) runs a subset, e.g. `SUITES=7 bash tests/integration.sh`
skips suites 1-6 for fast iteration.
Three scenarios CI cannot run: every advertised port is functional (DNS
resolves, TLS chain validates against numa's CA, HTTP/API respond), CA
fingerprint survives upgrade from pre-drop layout, binary staging
fallback from a 0700 source dir. Self-bootstraps a privileged
systemd-as-PID1 container — no dependency on long-lived test containers.
MainPID user assertion retries until comm=numa to avoid a race where
systemctl reports active while MainPID still points at a transitional
process.
Wire-level forwarding path skips DnsPacket parse/serialize on the hot
path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches
ID + TTLs in-place on lookup instead of cloning parsed packets.
Request hedging (Dean & Barroso "Tail at Scale") fires a second
parallel request after a configurable delay (default 10ms) when
the primary upstream stalls. DoH keepalive loop prevents idle
HTTP/2 + TLS connection teardown.
Recursive resolver now hedges across multiple NS addresses and
caches NS delegation records to skip TLD re-queries.
Integration test harness polls /blocking/stats instead of fixed
sleep, eliminating the blocklist-download race condition.
Port-53 and TLS-data-dir advisories told users to create
~/.config/numa/numa.toml, but config_dir() routed root to
/var/lib/numa/ and load_config never consulted the XDG path, so
the file the user created was silently ignored.
New suggested_config_path() helper prefers $HOME/.config/numa/
when HOME is set (and isn't "/" or empty), with config_dir() as
lazy fallback. Used by both advisories and by load_config as an
additional candidate, so the advised path is the path numa
actually reads. Runtime state (services.json, TLS CA) stays in
FHS — config_dir()/data_dir() are intentionally unchanged to
keep continuity with the installed daemon.
End-to-end replication + regression check in
tests/docker/issue-81.sh: four scenarios (replication and
existing-install, each against main and fix), all matching
expectations.
* chore: document multi-forwarder and cache warming in config and README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: DNS-over-HTTPS server endpoint (RFC 8484)
Serve DoH at POST /dns-query on the existing HTTPS proxy (port 443).
Automatically enabled when proxy TLS is active — no config needed.
Also fix zone map priority so local zones override RFC 6762 .local
special-use handling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: cargo fmt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove GoatCounter analytics from site
GoatCounter domains (goatcounter.com, gc.zgo.at) are blocked by
Hagezi Pro, which is Numa's default blocklist. A DNS privacy tool
should not embed analytics that its own resolver blocks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: enable DoT listener by default
DoT now starts automatically with `sudo numa`, matching the proxy and
DoH which are already on by default. The self-signed CA infrastructure
is shared with the proxy, so there is no additional setup. This makes
`numa setup-phone` work out of the box.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: advisory + exit(1) when port 53 is already in use (#45)
Detect AddrInUse on bind, print a human-readable diagnostic explaining
systemd-resolved / Dnscache as the likely cause and offer two concrete
fixes (sudo numa install, or bind_addr on a non-privileged port), then
exit(1) instead of surfacing a raw OS error.
Adds tests/docker/smoke-port53.sh: end-to-end Docker test that
pre-binds port 53 with a Python UDP socket and asserts the advisory +
exit code.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: collapse port53 advisory to single flat path
The per-platform cause sentences were cosmetic — they didn't change
the user's actions (install, or bind_addr on a non-privileged port),
but they introduced duplicated "another process..." strings, a
dead-from-CI branch (is_systemd_resolved_active() == true is never
reached by any test), and a pub visibility bump on
is_systemd_resolved_active for a single caller.
Replace with one flat format! whose cause line mentions both
systemd-resolved and the Windows DNS Client inline. The existing
smoke test now exercises 100% of the function.
is_systemd_resolved_active reverts to private.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use FHS-compliant /var/lib/numa as Linux data dir default
numa's default system-wide data directory was hardcoded to
/usr/local/var/numa for all Unix platforms. This is the right path on
macOS (Homebrew prefix convention) but non-FHS on Linux, where Arch /
Fedora / Debian / etc. expect persistent state under /var/lib/<pkg>.
The mismatch was invisible to existing users (numa creates the dir
silently on first run) but immediately surfaces when packaging for a
distro — see PR #33 (community contribution to add an Arch AUR package)
which had to add fragile sed-based path patching at PKGBUILD build time.
The fix moves the path decision into a small helper:
- daemon_data_dir() — cfg-gated platform dispatch (linux/macos)
- resolve_linux_data_dir() — pure function, takes "does X exist?"
as parameters, returns the right path
Linux behavior:
- Fresh install → /var/lib/numa (FHS)
- Upgrading from pre-v0.10.1 install → /usr/local/var/numa (legacy)
- Both paths exist → /var/lib/numa (FHS wins)
The legacy fallback is critical: existing v0.10.0 Linux users have
their CA cert + services.json under /usr/local/var/numa. Returning
the new path unconditionally would cause CA regeneration on upgrade,
breaking every browser that had trusted the previous CA. The fallback
is checked at startup via std::path::Path::exists, so the upgrade is
seamless and zero-config.
macOS behavior is unchanged — /usr/local/var/numa is still correct
because Homebrew's prefix is /usr/local.
Test coverage:
- resolve_linux_data_dir is a pure function gated cfg(any(linux,test))
so the same code path is unit-tested on every platform's CI run.
- Four tests cover all combinations of (legacy_exists, fhs_exists),
asserting the migration logic stays correct under future edits.
The default config in numa.toml is also updated to document the new
per-platform default paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: end-to-end FHS path verification + simplify cleanup
Two related changes from a /simplify pass and a follow-up testing
finalization:
1. lib.rs cleanup (no behavior change):
- Drop FHS_LINUX_DATA_DIR and LEGACY_LINUX_DATA_DIR consts. Both
were used in only 4 places total and the unit tests already
bypassed them with string literals, so they were over-engineering.
Inline the strings in daemon_data_dir() and resolve_linux_data_dir().
- Trim narrating doc/comments on the helper and the test bodies.
Keep only the non-obvious WHY (the macOS Homebrew note and the
migration-keeps-legacy rationale).
2. tests/docker/smoke-arch.sh:
- Cherry-picked the previously-uncommitted Arch compatibility smoke
test from feat/smoke-arch.
- Removed the [server] data_dir = "/tmp/numa-smoke" override from
the test config so the script now exercises the DEFAULT data dir
code path — which is exactly what the FHS fix touches.
- Added a path assertion after the dig succeeds: verify that
/var/lib/numa/ca.pem exists (FHS) and /usr/local/var/numa is
absent (no accidental dual-creation on a fresh install).
Verified end-to-end on archlinux:latest (Apple Silicon, Rosetta):
── building + running numa on archlinux:latest ──
── cargo build --release --locked ──
Finished `release` profile [optimized] target(s) in 24.02s
── dig @127.0.0.1 -p 5354 google.com A ──
142.251.38.206
── FHS path check ──
✓ CA cert at /var/lib/numa/ca.pem (FHS path)
✓ legacy path /usr/local/var/numa absent (fresh install used FHS)
── smoke-arch passed ──
This closes the testing gap where the unit tests covered the
path-decision LOGIC in isolation but nothing exercised the live
wiring on a real Linux filesystem.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: cross-platform CA trust (Arch/Fedora + Windows)
Closes#35.
trust_ca_linux now detects which trust store the distro ships and
runs the matching refresh command, instead of hardcoding Debian's
update-ca-certificates. Detection walks a const table in priority
order, picking the first whose anchor dir exists:
- debian: /usr/local/share/ca-certificates (update-ca-certificates)
- pki: /etc/pki/ca-trust/source/anchors (update-ca-trust extract)
- p11kit: /etc/ca-certificates/trust-source/anchors (trust extract-compat)
Falls back with a clear error listing every backend tried.
Adds Windows support via certutil -addstore Root / -delstore Root,
removing the silent CA-trust gap on numa install (previously the
service installed but the trust step quietly errored, leaving every
HTTPS .numa request throwing browser warnings).
Refactor: trust_ca and untrust_ca are now thin dispatchers calling
per-platform helpers. CA_COMMON_NAME and CA_FILE_NAME are centralized
in tls.rs and reused from system_dns.rs and api.rs. untrust_ca_linux
no longer pre-checks file existence (TOCTOU) and skips the refresh
when no file was actually removed.
Test: tests/docker/install-trust.sh runs the install/uninstall
contract against debian:stable, fedora:latest, and archlinux:latest
in containers, asserting the cert lands in (and is removed from)
the system bundle. All three pass locally.
README notes the Firefox/NSS limitation (separate trust store).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: rustfmt fixes for trust_ca_linux helpers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: macOS CA trust contract test (manual)
Adds tests/manual/install-trust-macos.sh — a sudo bash script that
mirrors trust_ca_macos / untrust_ca_macos against a fixture cert with
a unique CN. Designed to coexist with a running production numa:
- Refuses to run if a real "Numa Local CA" is already in System.keychain
(fail-closed protection for dogfood installs)
- Uses a unique CN ("Numa Local CA Test <pid-timestamp>") so the test
cert can never collide with production
- Mirrors the by-hash deletion loop from untrust_ca_macos
- Trap-cleanup on success or interrupt
Lives under tests/manual/ to signal "host-mutating, dev-only" — distinct
from tests/docker/install-trust.sh which is hermetic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: relax bail-out in macOS trust test (safe alongside production)
The bail-out was overly defensive. The test cert uses a unique CN
("Numa Local CA Test <pid-ts>") that is strictly longer than the
production CN, so `security find-certificate -c $TEST_CN` cannot
substring-match the production cert. All deletes are by-hash, which
can only target the test cert's specific hash. Coexistence is
provably safe; document the reasoning in the header comment block
and replace the refusal with an informational notice.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts the NUMA_DATA_DIR env var added in the previous commit and
replaces it with a [server] data_dir TOML field. Numa already has a
well-developed config system; adding a parallel env-var mechanism
for a single knob was wrong.
The principle: TOML is for application behavior configuration. Env
vars are for bootstrap values (HOME, SUDO_USER to discover paths
before config loads) and standard ecosystem conventions (RUST_LOG).
data_dir is neither — it's an app knob, so it belongs in the TOML.
Changes:
- lib.rs::data_dir() reverts to the platform-specific fallback only
- config.rs adds `data_dir: Option<PathBuf>` to ServerConfig
- main.rs resolves config.server.data_dir with fallback to
numa::data_dir() and passes it to build_tls_config, then stores the
resolved path on ctx.data_dir for downstream consumers
- tls.rs::build_tls_config takes `data_dir: &Path` as an explicit
parameter instead of calling crate::data_dir() behind the caller's
back. regenerate_tls and dot.rs self_signed_tls now pass
&ctx.data_dir, honoring whatever path the config resolved to
- tests/integration.sh Suite 6 uses `data_dir = "$NUMA_DATA"` in its
test TOML instead of the NUMA_DATA_DIR env var prefix
- numa.toml gains a commented-out data_dir example
No behavior change for existing production deployments (the default
path is unchanged). Test harness is now fully config-driven, and
containerized deploys can override data_dir via mount+config without
needing env var injection.
127/127 unit tests pass, Suite 6 passes end-to-end.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds integration test coverage for the realistic production shape
where both the HTTPS proxy and DoT are enabled simultaneously. This
was previously untested — every existing suite had either one or the
other, so the interaction path was implicit.
What Suite 6 verifies:
- Both listeners bind without panic
- DoT still resolves queries with the proxy enabled
- Proxy HTTPS handshake still works with DoT enabled
- Both certs validate against the same shared CA
To run non-root, adds a NUMA_DATA_DIR env var override to data_dir()
that lets callers point the CA/cert storage at any writable path.
Useful beyond tests: containerized deployments, CI runners, dev
testing without sudo. The fallback is the existing platform-specific
path (unix: /usr/local/var/numa, windows: %PROGRAMDATA%\numa).
Suite 6 sets NUMA_DATA_DIR=/tmp/numa-integration-data before
starting numa, then trusts the generated CA at $NUMA_DATA_DIR/ca.pem
for both kdig (DoT query) and openssl s_client (HTTPS proxy
handshake) verification.
All 6 suites, 32 checks, run non-root and pass locally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds tests/integration.sh Suite 5 (DoT via kdig + openssl) and
fixes a startup panic caught by it.
Bug: when [dot] cert_path/key_path was set AND [proxy] was disabled,
numa panicked on the first DoT handshake with "Could not
automatically determine the process-level CryptoProvider from Rustls
crate features". In normal deployments the proxy's build_tls_config
installs the default provider as a side effect, masking the missing
call in dot.rs::load_tls_config. Disable the proxy and the panic
surfaces. Fix: call
rustls::crypto::ring::default_provider().install_default() at the
top of load_tls_config (no-op if already installed).
Suite 5 exercises:
- DoT listener binds on configured port
- Resolves a local zone A record over TLS (kdig +tls)
- Persistent connection reuse (kdig +keepopen, 3 queries, 1 handshake)
- ALPN "dot" negotiation (openssl s_client -alpn dot)
- ALPN mismatch rejected with no_application_protocol (openssl -alpn h2)
Uses a pre-generated cert at /tmp so the test runs non-root.
Skips gracefully if kdig or openssl aren't installed.
Also: Dockerfile now EXPOSE 853/tcp so docker run -p 853:853 works
out of the box when users enable DoT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>