043a7e1ba5da32c291709d785f86d1fa668e5994
10 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7efac85836 |
feat: wire-level forwarding, cache, request hedging, and DoH keepalive
Wire-level forwarding path skips DnsPacket parse/serialize on the hot path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches ID + TTLs in-place on lookup instead of cloning parsed packets. Request hedging (Dean & Barroso "Tail at Scale") fires a second parallel request after a configurable delay (default 10ms) when the primary upstream stalls. DoH keepalive loop prevents idle HTTP/2 + TLS connection teardown. Recursive resolver now hedges across multiple NS addresses and caches NS delegation records to skip TLD re-queries. Integration test harness polls /blocking/stats instead of fixed sleep, eliminating the blocklist-download race condition. |
||
|
|
22bebb85a0 |
fix: config path advisory ignores XDG file on interactive root (#81) (#83)
Port-53 and TLS-data-dir advisories told users to create ~/.config/numa/numa.toml, but config_dir() routed root to /var/lib/numa/ and load_config never consulted the XDG path, so the file the user created was silently ignored. New suggested_config_path() helper prefers $HOME/.config/numa/ when HOME is set (and isn't "/" or empty), with config_dir() as lazy fallback. Used by both advisories and by load_config as an additional candidate, so the advised path is the path numa actually reads. Runtime state (services.json, TLS CA) stays in FHS — config_dir()/data_dir() are intentionally unchanged to keep continuity with the installed daemon. End-to-end replication + regression check in tests/docker/issue-81.sh: four scenarios (replication and existing-install, each against main and fix), all matching expectations. |
||
|
|
7d6b0ed568 |
feat: DoH server endpoint + DoT enabled by default (#79)
* chore: document multi-forwarder and cache warming in config and README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: DNS-over-HTTPS server endpoint (RFC 8484) Serve DoH at POST /dns-query on the existing HTTPS proxy (port 443). Automatically enabled when proxy TLS is active — no config needed. Also fix zone map priority so local zones override RFC 6762 .local special-use handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove GoatCounter analytics from site GoatCounter domains (goatcounter.com, gc.zgo.at) are blocked by Hagezi Pro, which is Numa's default blocklist. A DNS privacy tool should not embed analytics that its own resolver blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: enable DoT listener by default DoT now starts automatically with `sudo numa`, matching the proxy and DoH which are already on by default. The self-signed CA infrastructure is shared with the proxy, so there is no additional setup. This makes `numa setup-phone` work out of the box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
a6f23a5ddb |
fix: advisory + exit(1) when port 53 is already in use (#45) (#47)
* fix: advisory + exit(1) when port 53 is already in use (#45) Detect AddrInUse on bind, print a human-readable diagnostic explaining systemd-resolved / Dnscache as the likely cause and offer two concrete fixes (sudo numa install, or bind_addr on a non-privileged port), then exit(1) instead of surfacing a raw OS error. Adds tests/docker/smoke-port53.sh: end-to-end Docker test that pre-binds port 53 with a Python UDP socket and asserts the advisory + exit code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: collapse port53 advisory to single flat path The per-platform cause sentences were cosmetic — they didn't change the user's actions (install, or bind_addr on a non-privileged port), but they introduced duplicated "another process..." strings, a dead-from-CI branch (is_systemd_resolved_active() == true is never reached by any test), and a pub visibility bump on is_systemd_resolved_active for a single caller. Replace with one flat format! whose cause line mentions both systemd-resolved and the Windows DNS Client inline. The existing smoke test now exercises 100% of the function. is_systemd_resolved_active reverts to private. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
79ecb73d87 |
fix: use FHS-compliant /var/lib/numa as Linux data dir default (#43)
* fix: use FHS-compliant /var/lib/numa as Linux data dir default numa's default system-wide data directory was hardcoded to /usr/local/var/numa for all Unix platforms. This is the right path on macOS (Homebrew prefix convention) but non-FHS on Linux, where Arch / Fedora / Debian / etc. expect persistent state under /var/lib/<pkg>. The mismatch was invisible to existing users (numa creates the dir silently on first run) but immediately surfaces when packaging for a distro — see PR #33 (community contribution to add an Arch AUR package) which had to add fragile sed-based path patching at PKGBUILD build time. The fix moves the path decision into a small helper: - daemon_data_dir() — cfg-gated platform dispatch (linux/macos) - resolve_linux_data_dir() — pure function, takes "does X exist?" as parameters, returns the right path Linux behavior: - Fresh install → /var/lib/numa (FHS) - Upgrading from pre-v0.10.1 install → /usr/local/var/numa (legacy) - Both paths exist → /var/lib/numa (FHS wins) The legacy fallback is critical: existing v0.10.0 Linux users have their CA cert + services.json under /usr/local/var/numa. Returning the new path unconditionally would cause CA regeneration on upgrade, breaking every browser that had trusted the previous CA. The fallback is checked at startup via std::path::Path::exists, so the upgrade is seamless and zero-config. macOS behavior is unchanged — /usr/local/var/numa is still correct because Homebrew's prefix is /usr/local. Test coverage: - resolve_linux_data_dir is a pure function gated cfg(any(linux,test)) so the same code path is unit-tested on every platform's CI run. - Four tests cover all combinations of (legacy_exists, fhs_exists), asserting the migration logic stays correct under future edits. The default config in numa.toml is also updated to document the new per-platform default paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: end-to-end FHS path verification + simplify cleanup Two related changes from a /simplify pass and a follow-up testing finalization: 1. lib.rs cleanup (no behavior change): - Drop FHS_LINUX_DATA_DIR and LEGACY_LINUX_DATA_DIR consts. Both were used in only 4 places total and the unit tests already bypassed them with string literals, so they were over-engineering. Inline the strings in daemon_data_dir() and resolve_linux_data_dir(). - Trim narrating doc/comments on the helper and the test bodies. Keep only the non-obvious WHY (the macOS Homebrew note and the migration-keeps-legacy rationale). 2. tests/docker/smoke-arch.sh: - Cherry-picked the previously-uncommitted Arch compatibility smoke test from feat/smoke-arch. - Removed the [server] data_dir = "/tmp/numa-smoke" override from the test config so the script now exercises the DEFAULT data dir code path — which is exactly what the FHS fix touches. - Added a path assertion after the dig succeeds: verify that /var/lib/numa/ca.pem exists (FHS) and /usr/local/var/numa is absent (no accidental dual-creation on a fresh install). Verified end-to-end on archlinux:latest (Apple Silicon, Rosetta): ── building + running numa on archlinux:latest ── ── cargo build --release --locked ── Finished `release` profile [optimized] target(s) in 24.02s ── dig @127.0.0.1 -p 5354 google.com A ── 142.251.38.206 ── FHS path check ── ✓ CA cert at /var/lib/numa/ca.pem (FHS path) ✓ legacy path /usr/local/var/numa absent (fresh install used FHS) ── smoke-arch passed ── This closes the testing gap where the unit tests covered the path-decision LOGIC in isolation but nothing exercised the live wiring on a real Linux filesystem. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
039254280b |
fix: cross-platform CA trust (Arch/Fedora + Windows) (#41)
* fix: cross-platform CA trust (Arch/Fedora + Windows) Closes #35. trust_ca_linux now detects which trust store the distro ships and runs the matching refresh command, instead of hardcoding Debian's update-ca-certificates. Detection walks a const table in priority order, picking the first whose anchor dir exists: - debian: /usr/local/share/ca-certificates (update-ca-certificates) - pki: /etc/pki/ca-trust/source/anchors (update-ca-trust extract) - p11kit: /etc/ca-certificates/trust-source/anchors (trust extract-compat) Falls back with a clear error listing every backend tried. Adds Windows support via certutil -addstore Root / -delstore Root, removing the silent CA-trust gap on numa install (previously the service installed but the trust step quietly errored, leaving every HTTPS .numa request throwing browser warnings). Refactor: trust_ca and untrust_ca are now thin dispatchers calling per-platform helpers. CA_COMMON_NAME and CA_FILE_NAME are centralized in tls.rs and reused from system_dns.rs and api.rs. untrust_ca_linux no longer pre-checks file existence (TOCTOU) and skips the refresh when no file was actually removed. Test: tests/docker/install-trust.sh runs the install/uninstall contract against debian:stable, fedora:latest, and archlinux:latest in containers, asserting the cert lands in (and is removed from) the system bundle. All three pass locally. README notes the Firefox/NSS limitation (separate trust store). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: rustfmt fixes for trust_ca_linux helpers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: macOS CA trust contract test (manual) Adds tests/manual/install-trust-macos.sh — a sudo bash script that mirrors trust_ca_macos / untrust_ca_macos against a fixture cert with a unique CN. Designed to coexist with a running production numa: - Refuses to run if a real "Numa Local CA" is already in System.keychain (fail-closed protection for dogfood installs) - Uses a unique CN ("Numa Local CA Test <pid-timestamp>") so the test cert can never collide with production - Mirrors the by-hash deletion loop from untrust_ca_macos - Trap-cleanup on success or interrupt Lives under tests/manual/ to signal "host-mutating, dev-only" — distinct from tests/docker/install-trust.sh which is hermetic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: relax bail-out in macOS trust test (safe alongside production) The bail-out was overly defensive. The test cert uses a unique CN ("Numa Local CA Test <pid-ts>") that is strictly longer than the production CN, so `security find-certificate -c $TEST_CN` cannot substring-match the production cert. All deletes are by-hash, which can only target the test cert's specific hash. Coexistence is provably safe; document the reasoning in the header comment block and replace the refusal with an informational notice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
6887c8e02e |
refactor: move data_dir override from env var to [server] TOML field
Reverts the NUMA_DATA_DIR env var added in the previous commit and replaces it with a [server] data_dir TOML field. Numa already has a well-developed config system; adding a parallel env-var mechanism for a single knob was wrong. The principle: TOML is for application behavior configuration. Env vars are for bootstrap values (HOME, SUDO_USER to discover paths before config loads) and standard ecosystem conventions (RUST_LOG). data_dir is neither — it's an app knob, so it belongs in the TOML. Changes: - lib.rs::data_dir() reverts to the platform-specific fallback only - config.rs adds `data_dir: Option<PathBuf>` to ServerConfig - main.rs resolves config.server.data_dir with fallback to numa::data_dir() and passes it to build_tls_config, then stores the resolved path on ctx.data_dir for downstream consumers - tls.rs::build_tls_config takes `data_dir: &Path` as an explicit parameter instead of calling crate::data_dir() behind the caller's back. regenerate_tls and dot.rs self_signed_tls now pass &ctx.data_dir, honoring whatever path the config resolved to - tests/integration.sh Suite 6 uses `data_dir = "$NUMA_DATA"` in its test TOML instead of the NUMA_DATA_DIR env var prefix - numa.toml gains a commented-out data_dir example No behavior change for existing production deployments (the default path is unchanged). Test harness is now fully config-driven, and containerized deploys can override data_dir via mount+config without needing env var injection. 127/127 unit tests pass, Suite 6 passes end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7f52bd8a32 |
test: Suite 6 — proxy + DoT coexistence, NUMA_DATA_DIR override
Adds integration test coverage for the realistic production shape where both the HTTPS proxy and DoT are enabled simultaneously. This was previously untested — every existing suite had either one or the other, so the interaction path was implicit. What Suite 6 verifies: - Both listeners bind without panic - DoT still resolves queries with the proxy enabled - Proxy HTTPS handshake still works with DoT enabled - Both certs validate against the same shared CA To run non-root, adds a NUMA_DATA_DIR env var override to data_dir() that lets callers point the CA/cert storage at any writable path. Useful beyond tests: containerized deployments, CI runners, dev testing without sudo. The fallback is the existing platform-specific path (unix: /usr/local/var/numa, windows: %PROGRAMDATA%\numa). Suite 6 sets NUMA_DATA_DIR=/tmp/numa-integration-data before starting numa, then trusts the generated CA at $NUMA_DATA_DIR/ca.pem for both kdig (DoT query) and openssl s_client (HTTPS proxy handshake) verification. All 6 suites, 32 checks, run non-root and pass locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c98e6c3ea9 |
fix: install rustls crypto provider when loading user DoT cert
Adds tests/integration.sh Suite 5 (DoT via kdig + openssl) and fixes a startup panic caught by it. Bug: when [dot] cert_path/key_path was set AND [proxy] was disabled, numa panicked on the first DoT handshake with "Could not automatically determine the process-level CryptoProvider from Rustls crate features". In normal deployments the proxy's build_tls_config installs the default provider as a side effect, masking the missing call in dot.rs::load_tls_config. Disable the proxy and the panic surfaces. Fix: call rustls::crypto::ring::default_provider().install_default() at the top of load_tls_config (no-op if already installed). Suite 5 exercises: - DoT listener binds on configured port - Resolves a local zone A record over TLS (kdig +tls) - Persistent connection reuse (kdig +keepopen, 3 queries, 1 handshake) - ALPN "dot" negotiation (openssl s_client -alpn dot) - ALPN mismatch rejected with no_application_protocol (openssl -alpn h2) Uses a pre-generated cert at /tmp so the test runs non-root. Skips gracefully if kdig or openssl aren't installed. Also: Dockerfile now EXPOSE 853/tcp so docker run -p 853:853 works out of the box when users enable DoT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a84f2e7f1d |
feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation Numa becomes a true DNS resolver — resolves from root nameservers with complete DNSSEC chain-of-trust verification. Recursive resolution: - Iterative RFC 1034 from configurable root hints (13 default) - CNAME chasing (depth 8), referral following (depth 10) - A+AAAA glue extraction, IPv6 nameserver support - TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs - Config: mode = "recursive" in [upstream], root_hints, prime_tlds DNSSEC (all 4 phases): - EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020) - DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write - Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519 - Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326) - DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK) - RRSIG expiration/inception time validation - NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial - NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range - Authority RRSIG verification for denial proofs - Config: [dnssec] enabled/strict (default false, opt-in) - AD bit on Secure, SERVFAIL on Bogus+strict - DnssecStatus cached per entry, ValidationStats logging Performance: - TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY) - Referral DS piggybacking from authority sections - DNSKEY prefetch before validation loop - Cold-cache validation: ~1 DNSKEY fetch (down from 5) - Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns Also: - write_qname fix for root domain "." (was producing malformed queries) - write_record_header() dedup, write_bytes() bulk writes - DnsRecord::domain() + query_type() accessors - UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const - Real glue TTL (was hardcoded 3600) - DNSSEC restricted to recursive mode only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: TCP fallback, query minimization, UDP auto-disable Transport resilience for restrictive networks (ISPs blocking UDP:53): - DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry - UDP auto-disable: after 3 consecutive failures, switch to TCP-first - IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6) - Network change resets UDP detection for re-probing - Root hint rotation in TLD priming Privacy: - RFC 7816 query minimization: root servers see TLD only, not full name Code quality: - Merged find_starting_ns + find_starting_zone → find_closest_ns - Extracted resolve_ns_addrs_from_glue shared helper - Removed overall timeout wrapper (per-hop timeouts sufficient) - forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2) Testing: - Mock TCP-only DNS server for fallback tests (no network needed) - tcp_fallback_resolves_when_udp_blocked - tcp_only_iterative_resolution - tcp_fallback_handles_nxdomain - udp_auto_disable_resets - Integration test suite (4 suites, 51 tests) - Network probe script (tests/network-probe.sh) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: DNSSEC verified badge in dashboard query log - Add dnssec field to QueryLogEntry, track validation status per query - DnssecStatus::as_str() for API serialization - Dashboard shows green checkmark next to DNSSEC-verified responses - Blog post: add "How keys get there" section, transport resilience section, trim code blocks, update What's Next Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use SVG shield for DNSSEC badge, update blog HTML Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: NS cache lookup from authorities, UDP re-probe, shield alignment - find_closest_ns checks authorities (not just answers) for NS records, fixing TLD priming cache misses that caused redundant root queries - Periodic UDP re-probe every 5min when disabled — re-enables UDP after switching from a restrictive network to an open one - Dashboard DNSSEC shield uses fixed-width container for alignment - Blog post: tuck key-tag into trust anchor paragraph Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TCP single-write, mock server consistency, integration tests - TCP single-write fix: combine length prefix + message to avoid split segments that Microsoft/Azure DNS servers reject - Mock server (spawn_tcp_dns_server) updated to use single-write too - Tests: forward_tcp_wire_format, forward_tcp_single_segment_write - Integration: real-server checks for Microsoft/Office/Azure domains Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: recursive bar in dashboard, special-use domain interception Dashboard: - Add Recursive bar to resolution paths chart (cyan, distinct from Override) - Add RECURSIVE path tag style in query log Special-use domains (RFC 6761/6303/8880/9462): - .localhost → 127.0.0.1 (RFC 6761) - Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN - _dns.resolver.arpa (DDR) → NXDOMAIN - ipv4only.arpa (NAT64) → 192.0.0.170/171 - mDNS service discovery for private ranges → NXDOMAIN Eliminates ~900ms SERVFAILs for macOS system queries that were hitting root servers unnecessarily. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move generated blog HTML to site/blog/posts/, gitignore - Generated HTML now in site/blog/posts/ (gitignored) - CI workflow runs pandoc + make blog before deploy - Updated all internal blog links to /blog/posts/ path - blog/*.md remains the source of truth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review feedback — memory ordering, RRSIG time, NS resolution - Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES (ARM correctness for cross-thread coordination) - RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5) + 300s clock skew fudge factor (matches BIND) - resolve_ns_addrs_from_glue collects addresses from ALL NS names, not just the first with glue (improves failover) - is_special_use_domain: eliminate 16 format! allocations per .in-addr.arpa query (parse octet instead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: API endpoint tests, coverage target - 8 new axum handler tests: health, stats, query-log, overrides CRUD, cache, blocking stats, services CRUD, dashboard HTML - Tests use tower::oneshot — no network, no server startup - test_ctx() builds minimal ServerCtx for isolated testing - `make coverage` target (cargo-tarpaulin), separate from `make all` - 82 total tests (was 74) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |