The tcp_only_iterative_resolution, tcp_fallback_resolves_when_udp_blocked,
tcp_fallback_handles_nxdomain, and udp_auto_disable_resets tests all mutate
global UDP_DISABLED / UDP_FAILURES atomics. Under cargo test parallelism,
udp_auto_disable_resets would reset the flag mid-flight causing other tests
to attempt UDP against TCP-only mock servers and time out.
Fix: static Mutex serializes tests that depend on global UDP state.
Also: tcp_only_iterative_resolution now calls forward_tcp directly,
removing its dependence on the flag entirely.
- Remove forward_with_failover (parsed): warm_domain now uses _raw + insert_wire
- forward_udp delegates to forward_udp_raw (single UDP socket implementation)
- forward_query uses unified _raw path for all protocols
- Fix send_query_hedged warm branch: bare select! dropped secondary on primary
error instead of waiting for it — now drains both futures like the cold branch
- Remove pointless raw_len = len rename
Wire-level forwarding path skips DnsPacket parse/serialize on the hot
path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches
ID + TTLs in-place on lookup instead of cloning parsed packets.
Request hedging (Dean & Barroso "Tail at Scale") fires a second
parallel request after a configurable delay (default 10ms) when
the primary upstream stalls. DoH keepalive loop prevents idle
HTTP/2 + TLS connection teardown.
Recursive resolver now hedges across multiple NS addresses and
caches NS delegation records to skip TLD re-queries.
Integration test harness polls /blocking/stats instead of fixed
sleep, eliminating the blocklist-download race condition.
- Add 10s timeout on TLS handshake — prevents clients from holding a
semaphore permit without completing the handshake
- Add IDLE_TIMEOUT on payload read_exact — prevents slowloris after
sending a valid length prefix then trickling bytes
- Extract accept_loop() shared between start_dot and tests — eliminates
duplicated accept logic that could drift
- Add 5s timeout on TCP reads in recursive test mock server
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: auto recursive mode, fix Linux install
Auto mode (new default): probes a root server on startup; uses
recursive resolution if outbound DNS works, falls back to Quad9 DoH
if blocked. Dashboard shows mode indicator (green/yellow).
Linux install fixes:
- Add DNSStubListener=no to resolved drop-in (frees port 53)
- Configure DNS before starting service (correct ordering)
- Skip 127.0.0.53 in upstream detection
- `numa install` now does everything (service + DNS + CA)
- `numa uninstall` mirrors install (stop service + restore DNS)
- Extract is_loopback_or_stub() for consistent filtering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: enable DNSSEC validation by default
With recursive as the default mode, DNSSEC validation completes the
trustless resolution chain. Strict mode remains off by default.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: forward search domains to VPC resolver on Linux
Parse search/domain lines from resolv.conf and create conditional
forwarding rules to the original nameserver or AWS VPC resolver
(169.254.169.253). Fixes internal hostname resolution on cloud VMs
where recursive mode can't resolve private DNS zones.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: single-pass resolv.conf parsing, eliminate redundancies
Parse resolv.conf once for both upstream and search domains instead
of 2-3 reads. Extract CLOUD_VPC_RESOLVER constant. Use &'static str
for mode in StatsResponse. Remove dead read_upstream_from_file.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: macOS install health check, harden recursive probe
Verify numa is listening (API port) before redirecting system DNS on
macOS — if the service fails to start (e.g. port 53 in use), unload
the service and abort instead of breaking DNS. Probe up to 3 root
hints before declaring recursive mode unavailable. Validate IPs from
resolvectl to avoid IPv6 fragment extraction. Extract DEFAULT_API_PORT
constant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: widen make_rule cfg gate to include Linux
make_rule was gated to macOS-only but discover_linux() calls it for
search domain forwarding rules. CI failed on Linux with E0425.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: forward mode as default, recursive opt-in
Forward mode (transparent proxy to system DNS) is now the default.
Recursive and auto modes are explicit opt-in via config. This avoids
bypassing corporate DNS policies, captive portals, VPC private zones,
and parental controls on first install.
- Move #[default] from Auto to Forward on UpstreamMode
- DNSSEC defaults to off (no-op in forward mode)
- 3-way match in main: Forward/Recursive/Auto with clean separation
- Post-install message suggests mode = "recursive" for sovereignty
- Update README, site, and launch drafts messaging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add DnsPacket::query(id, domain, qtype) constructor; replace mock_query,
make_query, and 4 inline constructions across ctx/forward/recursive/api
- Add record_to_addr() in recursive.rs; replace 4 identical A/AAAA match
blocks with filter_map one-liners
- Add sinkhole_record() in ctx.rs; consolidate localhost and blocklist
A/AAAA branching into single calls
- Remove now-unused DnsQuestion imports
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: in-flight query coalescing for recursive resolver
When multiple queries for the same (domain, qtype) arrive concurrently
and all miss the cache, only the first triggers recursive resolution.
Subsequent queries wait on a broadcast channel for the result.
Prevents thundering herd where N concurrent cache misses each
independently walk the full NS chain, compounding timeouts.
Uses InflightGuard (Drop impl) to guarantee map cleanup on
panic/cancellation — prevents permanent SERVFAIL poisoning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: add InflightMap type alias for clippy
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add COALESCED query path and coalescing tests
Followers in the inflight coalescing path now log as COALESCED instead
of RECURSIVE, making it visible in the dashboard when queries were
deduplicated vs independently resolved. Adds 10 tests covering
InflightGuard cleanup, broadcast mechanics, and concurrent handle_query
coalescing through a mock TCP DNS server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: cargo fmt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: extract acquire_inflight, rewrite tests against real code
Move Disposition enum and inflight acquisition logic into a standalone
acquire_inflight() function. Rewrite 4 tests that were exercising tokio
primitives to call the real coalescing code path instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: SRTT-based nameserver selection for recursive resolver
BIND-style Smoothed RTT (EWMA) tracking per NS IP address. The resolver
learns which nameservers respond fastest and prefers them, eliminating
cascading timeouts from slow/unreachable IPv6 servers.
- New src/srtt.rs: SrttCache with record_rtt, record_failure, sort_by_rtt
- EWMA formula: new = (old * 7 + sample) / 8, 5s failure penalty, 5min decay
- TCP penalty (+100ms) lets SRTT naturally deprioritize IPv6-over-TCP
- Enabled flag embedded in SrttCache (no-op when disabled)
- Batch eviction (64 entries) for O(1) amortized writes at capacity
- Configurable via [upstream] srtt = true/false (default: true)
- Benchmark script: scripts/benchmark.sh (full, cold, warm, compare-all)
- Benchmarks show 12x avg improvement, 0% queries >1s (was 58%)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: show DNSSEC and SRTT status in dashboard + API
Add dnssec and srtt boolean fields to /stats API response.
Display on/off indicators in the dashboard footer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: apply SRTT decay before EWMA so recovered servers rehabilitate
Without decay-before-EWMA, a server penalized at 5000ms stayed near
that value even after recovery — the stale raw penalty was used as the
EWMA base instead of the decayed estimate. Extract decayed_srtt()
helper and call it in record_rtt() before the smoothing step.
Also restores removed "why" comments in send_query / resolve_recursive.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add install/upgrade instructions, smarter benchmark priming
README: document `numa install`, `numa service`, Homebrew upgrade,
and `make deploy` workflows. Benchmark: replace fixed `sleep 4` with
`wait_for_priming` that polls cache entry count for stability.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>