8d28ae9bbea78d7ccfdd964a95f872ea4a3caa05
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
cb54ab3dfc |
fix: harden DoT listener against slowloris and stale handshakes
- Add 10s timeout on TLS handshake — prevents clients from holding a semaphore permit without completing the handshake - Add IDLE_TIMEOUT on payload read_exact — prevents slowloris after sending a valid length prefix then trickling bytes - Extract accept_loop() shared between start_dot and tests — eliminates duplicated accept logic that could drift - Add 5s timeout on TCP reads in recursive test mock server Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
98da440c84 |
feat: forward-by-default, auto recursive mode, Linux install fixes (#27)
* feat: auto recursive mode, fix Linux install Auto mode (new default): probes a root server on startup; uses recursive resolution if outbound DNS works, falls back to Quad9 DoH if blocked. Dashboard shows mode indicator (green/yellow). Linux install fixes: - Add DNSStubListener=no to resolved drop-in (frees port 53) - Configure DNS before starting service (correct ordering) - Skip 127.0.0.53 in upstream detection - `numa install` now does everything (service + DNS + CA) - `numa uninstall` mirrors install (stop service + restore DNS) - Extract is_loopback_or_stub() for consistent filtering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: enable DNSSEC validation by default With recursive as the default mode, DNSSEC validation completes the trustless resolution chain. Strict mode remains off by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: forward search domains to VPC resolver on Linux Parse search/domain lines from resolv.conf and create conditional forwarding rules to the original nameserver or AWS VPC resolver (169.254.169.253). Fixes internal hostname resolution on cloud VMs where recursive mode can't resolve private DNS zones. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: single-pass resolv.conf parsing, eliminate redundancies Parse resolv.conf once for both upstream and search domains instead of 2-3 reads. Extract CLOUD_VPC_RESOLVER constant. Use &'static str for mode in StatsResponse. Remove dead read_upstream_from_file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: macOS install health check, harden recursive probe Verify numa is listening (API port) before redirecting system DNS on macOS — if the service fails to start (e.g. port 53 in use), unload the service and abort instead of breaking DNS. Probe up to 3 root hints before declaring recursive mode unavailable. Validate IPs from resolvectl to avoid IPv6 fragment extraction. Extract DEFAULT_API_PORT constant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: widen make_rule cfg gate to include Linux make_rule was gated to macOS-only but discover_linux() calls it for search domain forwarding rules. CI failed on Linux with E0425. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: forward mode as default, recursive opt-in Forward mode (transparent proxy to system DNS) is now the default. Recursive and auto modes are explicit opt-in via config. This avoids bypassing corporate DNS policies, captive portals, VPC private zones, and parental controls on first install. - Move #[default] from Auto to Forward on UpstreamMode - DNSSEC defaults to off (no-op in forward mode) - 3-way match in main: Forward/Recursive/Auto with clean separation - Post-install message suggests mode = "recursive" for sovereignty - Update README, site, and launch drafts messaging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
49535568d9 |
refactor: deduplicate query builders, record extraction, sinkhole records (#22)
- Add DnsPacket::query(id, domain, qtype) constructor; replace mock_query, make_query, and 4 inline constructions across ctx/forward/recursive/api - Add record_to_addr() in recursive.rs; replace 4 identical A/AAAA match blocks with filter_map one-liners - Add sinkhole_record() in ctx.rs; consolidate localhost and blocklist A/AAAA branching into single calls - Remove now-unused DnsQuestion imports Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
30e46e549c |
feat: in-flight query coalescing with COALESCED path (#20)
* feat: in-flight query coalescing for recursive resolver When multiple queries for the same (domain, qtype) arrive concurrently and all miss the cache, only the first triggers recursive resolution. Subsequent queries wait on a broadcast channel for the result. Prevents thundering herd where N concurrent cache misses each independently walk the full NS chain, compounding timeouts. Uses InflightGuard (Drop impl) to guarantee map cleanup on panic/cancellation — prevents permanent SERVFAIL poisoning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: add InflightMap type alias for clippy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add COALESCED query path and coalescing tests Followers in the inflight coalescing path now log as COALESCED instead of RECURSIVE, making it visible in the dashboard when queries were deduplicated vs independently resolved. Adds 10 tests covering InflightGuard cleanup, broadcast mechanics, and concurrent handle_query coalescing through a mock TCP DNS server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract acquire_inflight, rewrite tests against real code Move Disposition enum and inflight acquisition logic into a standalone acquire_inflight() function. Rewrite 4 tests that were exercising tokio primitives to call the real coalescing code path instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
06d4e91cd2 |
feat: SRTT-based nameserver selection (#19)
* feat: SRTT-based nameserver selection for recursive resolver BIND-style Smoothed RTT (EWMA) tracking per NS IP address. The resolver learns which nameservers respond fastest and prefers them, eliminating cascading timeouts from slow/unreachable IPv6 servers. - New src/srtt.rs: SrttCache with record_rtt, record_failure, sort_by_rtt - EWMA formula: new = (old * 7 + sample) / 8, 5s failure penalty, 5min decay - TCP penalty (+100ms) lets SRTT naturally deprioritize IPv6-over-TCP - Enabled flag embedded in SrttCache (no-op when disabled) - Batch eviction (64 entries) for O(1) amortized writes at capacity - Configurable via [upstream] srtt = true/false (default: true) - Benchmark script: scripts/benchmark.sh (full, cold, warm, compare-all) - Benchmarks show 12x avg improvement, 0% queries >1s (was 58%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: show DNSSEC and SRTT status in dashboard + API Add dnssec and srtt boolean fields to /stats API response. Display on/off indicators in the dashboard footer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply SRTT decay before EWMA so recovered servers rehabilitate Without decay-before-EWMA, a server penalized at 5000ms stayed near that value even after recovery — the stale raw penalty was used as the EWMA base instead of the decayed estimate. Extract decayed_srtt() helper and call it in record_rtt() before the smoothing step. Also restores removed "why" comments in send_query / resolve_recursive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add install/upgrade instructions, smarter benchmark priming README: document `numa install`, `numa service`, Homebrew upgrade, and `make deploy` workflows. Benchmark: replace fixed `sleep 4` with `wait_for_priming` that polls cache entry count for stability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
a84f2e7f1d |
feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation Numa becomes a true DNS resolver — resolves from root nameservers with complete DNSSEC chain-of-trust verification. Recursive resolution: - Iterative RFC 1034 from configurable root hints (13 default) - CNAME chasing (depth 8), referral following (depth 10) - A+AAAA glue extraction, IPv6 nameserver support - TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs - Config: mode = "recursive" in [upstream], root_hints, prime_tlds DNSSEC (all 4 phases): - EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020) - DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write - Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519 - Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326) - DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK) - RRSIG expiration/inception time validation - NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial - NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range - Authority RRSIG verification for denial proofs - Config: [dnssec] enabled/strict (default false, opt-in) - AD bit on Secure, SERVFAIL on Bogus+strict - DnssecStatus cached per entry, ValidationStats logging Performance: - TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY) - Referral DS piggybacking from authority sections - DNSKEY prefetch before validation loop - Cold-cache validation: ~1 DNSKEY fetch (down from 5) - Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns Also: - write_qname fix for root domain "." (was producing malformed queries) - write_record_header() dedup, write_bytes() bulk writes - DnsRecord::domain() + query_type() accessors - UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const - Real glue TTL (was hardcoded 3600) - DNSSEC restricted to recursive mode only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: TCP fallback, query minimization, UDP auto-disable Transport resilience for restrictive networks (ISPs blocking UDP:53): - DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry - UDP auto-disable: after 3 consecutive failures, switch to TCP-first - IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6) - Network change resets UDP detection for re-probing - Root hint rotation in TLD priming Privacy: - RFC 7816 query minimization: root servers see TLD only, not full name Code quality: - Merged find_starting_ns + find_starting_zone → find_closest_ns - Extracted resolve_ns_addrs_from_glue shared helper - Removed overall timeout wrapper (per-hop timeouts sufficient) - forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2) Testing: - Mock TCP-only DNS server for fallback tests (no network needed) - tcp_fallback_resolves_when_udp_blocked - tcp_only_iterative_resolution - tcp_fallback_handles_nxdomain - udp_auto_disable_resets - Integration test suite (4 suites, 51 tests) - Network probe script (tests/network-probe.sh) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: DNSSEC verified badge in dashboard query log - Add dnssec field to QueryLogEntry, track validation status per query - DnssecStatus::as_str() for API serialization - Dashboard shows green checkmark next to DNSSEC-verified responses - Blog post: add "How keys get there" section, transport resilience section, trim code blocks, update What's Next Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use SVG shield for DNSSEC badge, update blog HTML Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: NS cache lookup from authorities, UDP re-probe, shield alignment - find_closest_ns checks authorities (not just answers) for NS records, fixing TLD priming cache misses that caused redundant root queries - Periodic UDP re-probe every 5min when disabled — re-enables UDP after switching from a restrictive network to an open one - Dashboard DNSSEC shield uses fixed-width container for alignment - Blog post: tuck key-tag into trust anchor paragraph Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TCP single-write, mock server consistency, integration tests - TCP single-write fix: combine length prefix + message to avoid split segments that Microsoft/Azure DNS servers reject - Mock server (spawn_tcp_dns_server) updated to use single-write too - Tests: forward_tcp_wire_format, forward_tcp_single_segment_write - Integration: real-server checks for Microsoft/Office/Azure domains Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: recursive bar in dashboard, special-use domain interception Dashboard: - Add Recursive bar to resolution paths chart (cyan, distinct from Override) - Add RECURSIVE path tag style in query log Special-use domains (RFC 6761/6303/8880/9462): - .localhost → 127.0.0.1 (RFC 6761) - Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN - _dns.resolver.arpa (DDR) → NXDOMAIN - ipv4only.arpa (NAT64) → 192.0.0.170/171 - mDNS service discovery for private ranges → NXDOMAIN Eliminates ~900ms SERVFAILs for macOS system queries that were hitting root servers unnecessarily. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move generated blog HTML to site/blog/posts/, gitignore - Generated HTML now in site/blog/posts/ (gitignored) - CI workflow runs pandoc + make blog before deploy - Updated all internal blog links to /blog/posts/ path - blog/*.md remains the source of truth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review feedback — memory ordering, RRSIG time, NS resolution - Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES (ARM correctness for cross-thread coordination) - RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5) + 300s clock skew fudge factor (matches BIND) - resolve_ns_addrs_from_glue collects addresses from ALL NS names, not just the first with glue (improves failover) - is_special_use_domain: eliminate 16 format! allocations per .in-addr.arpa query (parse octet instead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: API endpoint tests, coverage target - 8 new axum handler tests: health, stats, query-log, overrides CRUD, cache, blocking stats, services CRUD, dashboard HTML - Tests use tower::oneshot — no network, no server startup - test_ctx() builds minimal ServerCtx for isolated testing - `make coverage` target (cargo-tarpaulin), separate from `make all` - 82 total tests (was 74) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |