Commit Graph

291 Commits

Author SHA1 Message Date
Razvan Dimescu
155c1c4da0 test: full-pipeline coverage for every resolve_query step
Test each pipeline stage in isolation through resolve_query:
- override takes precedence over all other paths
- localhost and *.localhost resolve to loopback
- local zone returns configured records
- .tld proxy resolves registered services to loopback
- blocklist sinkholes to 0.0.0.0
- cache hit returns stored response without upstream
2026-04-13 08:04:59 +03:00
Razvan Dimescu
b40004fe5e refactor: extract shared test infrastructure into testutil module
- test_ctx(): single ServerCtx builder, replaces 3 copies (ctx/api/dot)
- mock_upstream(): canned DNS response server for forwarding tests
- blackhole_upstream(): unresponsive socket for timeout tests
- Removes ~100 lines of duplicated 30-field struct literals
2026-04-13 07:56:47 +03:00
Razvan Dimescu
b8ddc16027 refactor: return QueryPath from resolve_query, add mock upstream to tests
resolve_query now returns (BytePacketBuffer, QueryPath) so callers
and tests can inspect the resolution path without reading the query
log. Production call sites (UDP, DoT, DoH) destructure and ignore it.

The forwarding test now uses a mock UDP upstream that replies with a
canned response, asserting QueryPath::Forwarded instead of != Local.
2026-04-13 07:51:14 +03:00
Razvan Dimescu
48f67be2f1 refactor: deduplicate test_ctx by delegating to test_ctx_with_forwarding 2026-04-13 07:39:55 +03:00
Razvan Dimescu
ca00846393 fix: forwarding rules override special-use NXDOMAIN for private PTR zones
Explicit [[forwarding]] rules now take precedence over the RFC 6303
special-use domain intercept. Previously, PTR queries for private
ranges (e.g. 168.192.in-addr.arpa) always returned local NXDOMAIN
even when a forwarding rule pointed them at a corporate DNS server.

Add full-pipeline resolve_query test harness (test_ctx + resolve_in_test)
and two tests covering both the default behavior and the override.

Closes #94
2026-04-13 07:36:53 +03:00
Razvan Dimescu
4d4e48bbd6 chore: bump version to 0.13.0 v0.13.0 2026-04-13 01:05:20 +03:00
Razvan Dimescu
724c4a6017 Merge pull request #91 from razvandimescu/docs/readme-update
docs: update README with v0.13.0 features
2026-04-13 01:03:16 +03:00
Razvan Dimescu
2b29a44ee0 docs: remove unfair NextDNS comparison from performance section
Comparing local cache (0.8ms) vs a remote service (37ms) measures
network latency, not resolver quality. Any local resolver would
show the same advantage. Replaced with AdGuard Home comparison
which is a fair local-to-local benchmark.
2026-04-13 01:02:10 +03:00
Razvan Dimescu
588e5226fd Merge pull request #92 from razvandimescu/bench/vs-adguard
bench: add --vs-adguard comparison mode
2026-04-13 01:00:20 +03:00
Razvan Dimescu
501902d569 bench: add --vs-adguard mode for Numa vs AdGuard Home comparison
AdGuard Home on port 5457, both forwarding via DoH. Cached queries
tied at 0.1ms. On degraded networks hedging hurts p99 (28ms vs 10ms
without) — both requests pay the same high RTT with no random spikes
to rescue. On clean networks hedging wins.
2026-04-13 00:56:58 +03:00
Razvan Dimescu
77d2c8bbcd docs: update README comparison table, performance, and roadmap
- Comparison table: add DoH/DoT upstream, DoH server, request hedging,
  serve-stale + prefetch, conditional forwarding rows
- Performance: update with current benchmark numbers (0.1ms cached,
  47x NextDNS, p99 -28% vs Unbound)
- Roadmap: add hedging, serve-stale, conditional forwarding, DoT upstream
- Fix broken benchmarks link (bench/ → benches/)
2026-04-13 00:18:52 +03:00
Razvan Dimescu
274338e7f9 Merge pull request #88 from razvandimescu/fix/doh-loopback-san
fix: DoH endpoint accepts loopback, TLS cert includes IP SANs
2026-04-13 00:03:30 +03:00
Razvan Dimescu
305935ed98 style: rustfmt strip_port 2026-04-12 23:59:51 +03:00
Razvan Dimescu
bd505813b6 test: verify TLS cert SANs (wildcard, services, loopback, localhost, bare TLD)
Parse the generated DER cert with x509-parser to assert the exact SAN
set, catching silent try_into() failures that a params-level test
would miss.
2026-04-12 23:54:55 +03:00
Razvan Dimescu
115a55b199 fix: bracketed IPv6, localhost SAN, split host-check helpers
- is_doh_host split into strip_port + is_loopback_host + is_tld_match
- strip_port handles bracketed IPv6 ([::1]:443) and rejects bare IPv6
- Add [::1] to accepted loopback hosts, add localhost DNS SAN to cert
- Remove dead sans.is_empty() guard (loopback IPs always present)
2026-04-12 23:54:26 +03:00
Razvan Dimescu
3665deb56b fix: accept loopback addresses for DoH and add IP SANs to TLS cert
The DoH endpoint rejected requests with Host: 127.0.0.1/::1/localhost,
and the generated TLS cert had no IP SANs — so browsers couldn't use
https://127.0.0.1/dns-query even with the CA trusted.

- is_doh_host now accepts 127.0.0.1, ::1, localhost (with optional port)
- TLS cert includes 127.0.0.1 and ::1 IP SANs, plus bare TLD DNS SAN

Closes #87
2026-04-12 23:54:26 +03:00
Razvan Dimescu
c074d728e9 Merge pull request #90 from razvandimescu/feat/wire-forwarding-hedging
feat: transport protocol tracking with dashboard visualization
2026-04-12 23:38:57 +03:00
Razvan Dimescu
2101dfcf17 feat: transport protocol tracking (UDP/TCP/DoT/DoH) with dashboard visualization
Thread Transport enum through resolve pipeline, record per-query
transport in stats and query log. Dashboard gets bar chart panel
with encryption %, transport column in query log, and filter dropdown.
2026-04-12 22:14:26 +03:00
Razvan Dimescu
27dc53aebb Merge pull request #85 from razvandimescu/feat/wire-forwarding-hedging
feat: wire-level forwarding, cache, and request hedging
2026-04-12 22:02:45 +03:00
Razvan Dimescu
8085c10687 docs: document hedge_ms, tls:// upstream, update max_entries default in numa.toml 2026-04-12 21:37:59 +03:00
Razvan Dimescu
02e1449a45 feat: enable request hedging for all upstream protocols
Hedging was DoH-only (hyper dispatch spike mitigation). Now applies to
UDP (rescues packet loss) and DoT (rescues TLS handshake stalls) too.
Same-upstream hedging: fires a second independent request after hedge_ms
delay. First response wins. Disable with hedge_ms = 0.
2026-04-12 21:34:47 +03:00
Razvan Dimescu
50828c411a fix: cold benchmark uses 1 round per domain for genuine cold measurements
With ROUNDS=10, only the first query per domain was truly cold — the
other 9 hit cached NS delegations at <1ms, diluting the median to
0.4ms. Now cold mode uses 1 round so every sample is a real cold
resolve. Also extracted compare_two_rounds to support per-mode rounds.
2026-04-12 21:00:24 +03:00
Razvan Dimescu
5184891985 fix: cold benchmark cache-busting with PID prefix and flush
Re-runs of --vs-unbound-cold were hitting stale cache entries from
prior runs. The static COUNTER reset to 0 each process, generating
the same c0.example.com subdomains. With the 1-hour stale window,
entries from 10 minutes ago served as stale hits.

Fix: prefix with PID (r{pid}-c{n}.domain) and flush Numa's cache
before cold benchmarks.
2026-04-12 20:50:04 +03:00
Razvan Dimescu
6d9ee14ea6 refactor: unify warm_stale/warm_domain, remove raw_wire alloc, add Freshness enum
- Extract refresh_entry in ctx.rs — warm_domain in main.rs now delegates
  to it instead of duplicating the resolve+cache logic (~40 lines removed)
- Eliminate unconditional .to_vec() of raw wire on every UDP/DoT query —
  pass &buffer.buf[..len] directly (zero-cost for cache hits)
- Replace bare bool stale flag with Freshness enum (Fresh/NearExpiry/Stale)
  making the three states self-documenting at every call site
2026-04-12 19:56:42 +03:00
Razvan Dimescu
3c49b0e65d fix: deduplicate background refresh with per-domain guard
Multiple stale queries for the same domain now spawn only one background
refresh. A HashSet<(String, QueryType)> on ServerCtx tracks in-flight
refreshes; subsequent stale hits for the same key skip the spawn.
2026-04-12 19:49:23 +03:00
Razvan Dimescu
8ef95383a2 feat: prefetch at <10% TTL remaining, add stale behavior tests
Entries with <10% TTL remaining are now marked stale on lookup,
triggering a background refresh before they expire. Combined with
the serve-stale + background refresh from the previous commit, this
means entries are proactively refreshed — matching Unbound's prefetch
behavior.
2026-04-12 19:46:14 +03:00
Razvan Dimescu
571ce2f013 feat: background refresh on stale cache hit (RFC 8767 revalidation)
When a cached entry is expired but within the 1-hour stale window,
serve it immediately with TTL=1 AND spawn a background re-resolve.
The next query gets a fresh entry instead of another stale serve.

Without this, stale entries were served repeatedly for up to an hour
with no refresh — effectively ignoring TTL.
2026-04-12 19:42:56 +03:00
Razvan Dimescu
043a7e1ba5 feat: raise cache default to 100K entries, evict stalest instead of dropping
The 10K cap was too conservative — the blocklist alone holds 400K domains.
At ~100 bytes per wire entry, 100K entries is ~10MB.

When the cache is full and evict_expired doesn't free enough slots,
evict_stalest removes the entry with the least remaining TTL instead of
silently discarding the new insert.
2026-04-12 19:23:28 +03:00
Razvan Dimescu
05d5a5145f refactor: remove unused extract_question and read_wire_qname from wire.rs 2026-04-12 18:46:03 +03:00
Razvan Dimescu
15058aea83 bench: add --vs-nextdns, --vs-unbound-cold modes with mode validation
- --vs-nextdns: Numa local cache vs NextDNS cloud (45.90.28.0)
- --vs-unbound-cold: unique random subdomains, no record cache hits
- check_numa_mode validates forward/recursive mode before running
- numa-bench-recursive.toml config for cold benchmarks
2026-04-12 18:41:09 +03:00
Razvan Dimescu
628ed00074 refactor: extract cache_and_parse, remove dead truncation log, restore TCP_TIMEOUT to 400ms 2026-04-12 18:40:46 +03:00
Razvan Dimescu
85cff052a4 fix: restore TCP_TIMEOUT to 400ms (test race was the real issue) 2026-04-12 18:40:46 +03:00
Razvan Dimescu
67b472fea7 fix: serialize tests that share global UDP_DISABLED state
The tcp_only_iterative_resolution, tcp_fallback_resolves_when_udp_blocked,
tcp_fallback_handles_nxdomain, and udp_auto_disable_resets tests all mutate
global UDP_DISABLED / UDP_FAILURES atomics. Under cargo test parallelism,
udp_auto_disable_resets would reset the flag mid-flight causing other tests
to attempt UDP against TCP-only mock servers and time out.

Fix: static Mutex serializes tests that depend on global UDP state.
Also: tcp_only_iterative_resolution now calls forward_tcp directly,
removing its dependence on the flag entirely.
2026-04-12 18:40:46 +03:00
Razvan Dimescu
700cca9cb6 style: rustfmt warm_domain 2026-04-12 18:40:46 +03:00
Razvan Dimescu
f705f8c49f fix: bump TCP_TIMEOUT to 800ms to fix flaky CI test 2026-04-12 18:40:46 +03:00
Razvan Dimescu
17a1a6ddba refactor: remove forward_with_failover duplication, fix warm-branch hedge bug
- Remove forward_with_failover (parsed): warm_domain now uses _raw + insert_wire
- forward_udp delegates to forward_udp_raw (single UDP socket implementation)
- forward_query uses unified _raw path for all protocols
- Fix send_query_hedged warm branch: bare select! dropped secondary on primary
  error instead of waiting for it — now drains both futures like the cold branch
- Remove pointless raw_len = len rename
2026-04-12 18:40:46 +03:00
Razvan Dimescu
72b540a44a feat: wire-level cache, serve-stale, raw wire passthrough
- Cache stores raw DNS wire bytes + TTL offsets (2.4x memory reduction)
- Serve-stale (RFC 8767): expired entries returned with TTL=1 for 1hr
- handle_query captures raw_len from recv_from for zero-copy forwarding
- resolve_query accepts raw wire bytes, forwards without re-serializing
- wire.rs: TTL offset scanner, ID/TTL patching, question extraction
- 52 wire tests + 16 cache regression tests
2026-04-12 18:40:46 +03:00
Razvan Dimescu
c1b651aa63 chore: remove obsolete bash benchmark script 2026-04-12 18:40:46 +03:00
Razvan Dimescu
5d9a3a809b feat: DoT client, recursive optimization, bench refactor
- Add DoT forwarding client (tls://IP#hostname upstream config)
- Recursive: cache NS delegations, serve-stale (RFC 8767), parallel
  NS queries on cold, no TCP fallback on individual UDP timeouts,
  400ms NS/TCP timeout (down from 800/1500ms)
- Reduce recursive p99 from 2367ms to 402ms (vs Unbound's 148ms)
- Refactor benchmark suite: generic compare_two engine, delete
  one-off diagnostics (1969 → 750 lines)
- Code cleanup: forward_query delegates to _raw, Option<String>
  for tls_name, saturating_sub for ns_idx
2026-04-12 18:40:46 +03:00
Razvan Dimescu
7efac85836 feat: wire-level forwarding, cache, request hedging, and DoH keepalive
Wire-level forwarding path skips DnsPacket parse/serialize on the hot
path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches
ID + TTLs in-place on lookup instead of cloning parsed packets.

Request hedging (Dean & Barroso "Tail at Scale") fires a second
parallel request after a configurable delay (default 10ms) when
the primary upstream stalls. DoH keepalive loop prevents idle
HTTP/2 + TLS connection teardown.

Recursive resolver now hedges across multiple NS addresses and
caches NS delegation records to skip TLD re-queries.

Integration test harness polls /blocking/stats instead of fixed
sleep, eliminating the blocklist-download race condition.
2026-04-12 18:39:48 +03:00
Razvan Dimescu
4f46550283 Merge pull request #89 from razvandimescu/feat/dot-client
feat: DoT (DNS over TLS) client upstream
2026-04-12 18:39:17 +03:00
Razvan Dimescu
05baad0cc0 feat: DoT (DNS over TLS) client upstream
Adds tls:// upstream support for forwarding queries over DNS-over-TLS
(RFC 7858). Parses tls://IP:PORT#hostname format, with default port 853.

- New Upstream::Dot variant with TLS connector
- forward_dot: length-prefixed DNS over TLS stream
- build_dot_connector: system root CAs via webpki-roots
- parse_upstream handles tls:// prefix

Example config:
  address = ["tls://9.9.9.9#dns.quad9.net"]
2026-04-12 18:35:06 +03:00
Razvan Dimescu
7047767dc2 feat: per-suffix conditional forwarding rules (#82) (#84)
* feat: per-suffix conditional forwarding rules in numa.toml (#82)

Adds a `[[forwarding]]` config section so users can explicitly route
domain suffixes to specific upstreams. Config-declared rules take
precedence over auto-discovered rules (macOS scutil, Linux search
domains) via first-match semantics.

Example — the reporter's reverse-DNS case:

  [[forwarding]]
  suffix = "168.192.in-addr.arpa"
  upstream = "100.90.1.63:5361"

Bare IPs default to port 53. IPv6 is supported via
parse_upstream_addr. ForwardingRule::new() constructor replaces
direct struct-literal construction, and make_rule() now delegates
to parse_upstream_addr to fix a latent IPv6 parsing bug.

* feat: accept suffix as string or array in [[forwarding]] rules

Reuses existing string_or_vec deserializer so users can write:
  suffix = ["168.192.in-addr.arpa", "onsite"]
instead of repeating [[forwarding]] blocks per suffix.

* style: rustfmt

* refactor: drop config_count from merge_forwarding_rules return

Log config rules directly from config.forwarding before merging,
keeping the merge API clean of logging concerns.
2026-04-12 06:12:08 +03:00
Razvan Dimescu
22bebb85a0 fix: config path advisory ignores XDG file on interactive root (#81) (#83)
Port-53 and TLS-data-dir advisories told users to create
~/.config/numa/numa.toml, but config_dir() routed root to
/var/lib/numa/ and load_config never consulted the XDG path, so
the file the user created was silently ignored.

New suggested_config_path() helper prefers $HOME/.config/numa/
when HOME is set (and isn't "/" or empty), with config_dir() as
lazy fallback. Used by both advisories and by load_config as an
additional candidate, so the advised path is the path numa
actually reads. Runtime state (services.json, TLS CA) stays in
FHS — config_dir()/data_dir() are intentionally unchanged to
keep continuity with the installed daemon.

End-to-end replication + regression check in
tests/docker/issue-81.sh: four scenarios (replication and
existing-install, each against main and fix), all matching
expectations.
2026-04-12 02:17:33 +03:00
Razvan Dimescu
289f2b973b chore: remove built blog HTML from tracking (built by CI)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:10:13 +03:00
Razvan Dimescu
fb4cbe0b2a chore: update DoT blog post — mark DoH server as shipped in v0.12.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:08:09 +03:00
Razvan Dimescu
2de1bc2efc chore: bump version to 0.12.0 v0.12.0 2026-04-11 12:15:40 +03:00
Razvan Dimescu
156b68de87 fix: replace unscannable QR art with placeholder in blog post (#80)
The Unicode block-character QR code in the DoT blog post can't be
scanned by phone cameras due to HTML font metrics distorting the grid.
Replace with a bordered placeholder box — the dashboard screenshot
already shows a working QR.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 04:17:46 +03:00
Razvan Dimescu
7d6b0ed568 feat: DoH server endpoint + DoT enabled by default (#79)
* chore: document multi-forwarder and cache warming in config and README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: DNS-over-HTTPS server endpoint (RFC 8484)

Serve DoH at POST /dns-query on the existing HTTPS proxy (port 443).
Automatically enabled when proxy TLS is active — no config needed.
Also fix zone map priority so local zones override RFC 6762 .local
special-use handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove GoatCounter analytics from site

GoatCounter domains (goatcounter.com, gc.zgo.at) are blocked by
Hagezi Pro, which is Numa's default blocklist. A DNS privacy tool
should not embed analytics that its own resolver blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: enable DoT listener by default

DoT now starts automatically with `sudo numa`, matching the proxy and
DoH which are already on by default. The self-signed CA infrastructure
is shared with the proxy, so there is no additional setup. This makes
`numa setup-phone` work out of the box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 04:06:17 +03:00
Razvan Dimescu
7770129589 feat: cache warming — proactive DNS resolution for configured domains (#78)
Resolves A + AAAA at startup for domains listed in [cache] warm,
then re-resolves before TTL expiry (at 75% elapsed). Keeps critical
domains always hot in cache with zero client-visible latency.

Closes #34 (item 4)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 01:14:04 +03:00