feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation Numa becomes a true DNS resolver — resolves from root nameservers with complete DNSSEC chain-of-trust verification. Recursive resolution: - Iterative RFC 1034 from configurable root hints (13 default) - CNAME chasing (depth 8), referral following (depth 10) - A+AAAA glue extraction, IPv6 nameserver support - TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs - Config: mode = "recursive" in [upstream], root_hints, prime_tlds DNSSEC (all 4 phases): - EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020) - DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write - Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519 - Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326) - DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK) - RRSIG expiration/inception time validation - NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial - NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range - Authority RRSIG verification for denial proofs - Config: [dnssec] enabled/strict (default false, opt-in) - AD bit on Secure, SERVFAIL on Bogus+strict - DnssecStatus cached per entry, ValidationStats logging Performance: - TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY) - Referral DS piggybacking from authority sections - DNSKEY prefetch before validation loop - Cold-cache validation: ~1 DNSKEY fetch (down from 5) - Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns Also: - write_qname fix for root domain "." (was producing malformed queries) - write_record_header() dedup, write_bytes() bulk writes - DnsRecord::domain() + query_type() accessors - UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const - Real glue TTL (was hardcoded 3600) - DNSSEC restricted to recursive mode only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: TCP fallback, query minimization, UDP auto-disable Transport resilience for restrictive networks (ISPs blocking UDP:53): - DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry - UDP auto-disable: after 3 consecutive failures, switch to TCP-first - IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6) - Network change resets UDP detection for re-probing - Root hint rotation in TLD priming Privacy: - RFC 7816 query minimization: root servers see TLD only, not full name Code quality: - Merged find_starting_ns + find_starting_zone → find_closest_ns - Extracted resolve_ns_addrs_from_glue shared helper - Removed overall timeout wrapper (per-hop timeouts sufficient) - forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2) Testing: - Mock TCP-only DNS server for fallback tests (no network needed) - tcp_fallback_resolves_when_udp_blocked - tcp_only_iterative_resolution - tcp_fallback_handles_nxdomain - udp_auto_disable_resets - Integration test suite (4 suites, 51 tests) - Network probe script (tests/network-probe.sh) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: DNSSEC verified badge in dashboard query log - Add dnssec field to QueryLogEntry, track validation status per query - DnssecStatus::as_str() for API serialization - Dashboard shows green checkmark next to DNSSEC-verified responses - Blog post: add "How keys get there" section, transport resilience section, trim code blocks, update What's Next Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use SVG shield for DNSSEC badge, update blog HTML Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: NS cache lookup from authorities, UDP re-probe, shield alignment - find_closest_ns checks authorities (not just answers) for NS records, fixing TLD priming cache misses that caused redundant root queries - Periodic UDP re-probe every 5min when disabled — re-enables UDP after switching from a restrictive network to an open one - Dashboard DNSSEC shield uses fixed-width container for alignment - Blog post: tuck key-tag into trust anchor paragraph Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TCP single-write, mock server consistency, integration tests - TCP single-write fix: combine length prefix + message to avoid split segments that Microsoft/Azure DNS servers reject - Mock server (spawn_tcp_dns_server) updated to use single-write too - Tests: forward_tcp_wire_format, forward_tcp_single_segment_write - Integration: real-server checks for Microsoft/Office/Azure domains Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: recursive bar in dashboard, special-use domain interception Dashboard: - Add Recursive bar to resolution paths chart (cyan, distinct from Override) - Add RECURSIVE path tag style in query log Special-use domains (RFC 6761/6303/8880/9462): - .localhost → 127.0.0.1 (RFC 6761) - Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN - _dns.resolver.arpa (DDR) → NXDOMAIN - ipv4only.arpa (NAT64) → 192.0.0.170/171 - mDNS service discovery for private ranges → NXDOMAIN Eliminates ~900ms SERVFAILs for macOS system queries that were hitting root servers unnecessarily. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move generated blog HTML to site/blog/posts/, gitignore - Generated HTML now in site/blog/posts/ (gitignored) - CI workflow runs pandoc + make blog before deploy - Updated all internal blog links to /blog/posts/ path - blog/*.md remains the source of truth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review feedback — memory ordering, RRSIG time, NS resolution - Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES (ARM correctness for cross-thread coordination) - RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5) + 300s clock skew fudge factor (matches BIND) - resolve_ns_addrs_from_glue collects addresses from ALL NS names, not just the first with glue (improves failover) - is_special_use_domain: eliminate 16 format! allocations per .in-addr.arpa query (parse octet instead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: API endpoint tests, coverage target - 8 new axum handler tests: health, stats, query-log, overrides CRUD, cache, blocking stats, services CRUD, dashboard HTML - Tests use tower::oneshot — no network, no server startup - test_ctx() builds minimal ServerCtx for isolated testing - `make coverage` target (cargo-tarpaulin), separate from `make all` - 82 total tests (was 74) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit was merged in pull request #17.
This commit is contained in:
128
tests/network-probe.sh
Executable file
128
tests/network-probe.sh
Executable file
@@ -0,0 +1,128 @@
|
||||
#!/usr/bin/env bash
|
||||
# Network probe: tests which DNS transports are available on the current network.
|
||||
# Run on a problematic network to diagnose what's blocked.
|
||||
# Usage: ./tests/network-probe.sh
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
GREEN="\033[32m"
|
||||
RED="\033[31m"
|
||||
DIM="\033[90m"
|
||||
RESET="\033[0m"
|
||||
|
||||
PASSED=0
|
||||
FAILED=0
|
||||
|
||||
probe() {
|
||||
local name="$1"
|
||||
local cmd="$2"
|
||||
local expect="$3"
|
||||
|
||||
local result
|
||||
result=$(eval "$cmd" 2>&1) || true
|
||||
|
||||
if echo "$result" | grep -q "$expect"; then
|
||||
PASSED=$((PASSED + 1))
|
||||
printf " ${GREEN}✓${RESET} %-45s ${DIM}%s${RESET}\n" "$name" "$(echo "$result" | head -1 | cut -c1-60)"
|
||||
else
|
||||
FAILED=$((FAILED + 1))
|
||||
printf " ${RED}✗${RESET} %-45s ${DIM}blocked/timeout${RESET}\n" "$name"
|
||||
fi
|
||||
}
|
||||
|
||||
echo ""
|
||||
echo "Network DNS Transport Probe"
|
||||
echo "==========================="
|
||||
echo "Network: $(networksetup -getairportnetwork en0 2>/dev/null | sed 's/Current Wi-Fi Network: //' || echo 'unknown')"
|
||||
echo "Local IP: $(ipconfig getifaddr en0 2>/dev/null || echo 'unknown')"
|
||||
echo "Gateway: $(route -n get default 2>/dev/null | grep gateway | awk '{print $2}' || echo 'unknown')"
|
||||
echo ""
|
||||
|
||||
echo "=== UDP port 53 (recursive resolution) ==="
|
||||
probe "Root server a (198.41.0.4)" \
|
||||
"dig @198.41.0.4 . NS +short +time=5 +tries=1" \
|
||||
"root-servers"
|
||||
|
||||
probe "Root server k (193.0.14.129)" \
|
||||
"dig @193.0.14.129 . NS +short +time=5 +tries=1" \
|
||||
"root-servers"
|
||||
|
||||
probe "Google DNS (8.8.8.8)" \
|
||||
"dig @8.8.8.8 google.com A +short +time=5 +tries=1" \
|
||||
"\."
|
||||
|
||||
probe "Cloudflare (1.1.1.1)" \
|
||||
"dig @1.1.1.1 cloudflare.com A +short +time=5 +tries=1" \
|
||||
"\."
|
||||
|
||||
probe ".com TLD (192.5.6.30)" \
|
||||
"dig @192.5.6.30 google.com NS +short +time=5 +tries=1" \
|
||||
"google"
|
||||
|
||||
echo ""
|
||||
echo "=== TCP port 53 ==="
|
||||
probe "Google DNS TCP (8.8.8.8)" \
|
||||
"dig @8.8.8.8 google.com A +short +tcp +time=5 +tries=1" \
|
||||
"\."
|
||||
|
||||
probe "Root server TCP (198.41.0.4)" \
|
||||
"dig @198.41.0.4 . NS +short +tcp +time=5 +tries=1" \
|
||||
"root-servers"
|
||||
|
||||
echo ""
|
||||
echo "=== DoT port 853 (DNS-over-TLS) ==="
|
||||
probe "Quad9 DoT (9.9.9.9:853)" \
|
||||
"echo Q | openssl s_client -connect 9.9.9.9:853 -servername dns.quad9.net 2>&1 | grep 'verify return'" \
|
||||
"verify return"
|
||||
|
||||
probe "Cloudflare DoT (1.1.1.1:853)" \
|
||||
"echo Q | openssl s_client -connect 1.1.1.1:853 -servername cloudflare-dns.com 2>&1 | grep 'verify return'" \
|
||||
"verify return"
|
||||
|
||||
echo ""
|
||||
echo "=== DoH port 443 (DNS-over-HTTPS) ==="
|
||||
probe "Quad9 DoH (dns.quad9.net)" \
|
||||
"curl -s -m 5 -H 'accept: application/dns-json' 'https://dns.quad9.net:443/dns-query?name=google.com&type=A'" \
|
||||
"Answer"
|
||||
|
||||
probe "Cloudflare DoH (1.1.1.1)" \
|
||||
"curl -s -m 5 -H 'accept: application/dns-json' 'https://1.1.1.1/dns-query?name=google.com&type=A'" \
|
||||
"Answer"
|
||||
|
||||
probe "Google DoH (dns.google)" \
|
||||
"curl -s -m 5 'https://dns.google/resolve?name=google.com&type=A'" \
|
||||
"Answer"
|
||||
|
||||
echo ""
|
||||
echo "=== ISP DNS ==="
|
||||
# Detect system DNS
|
||||
SYS_DNS=$(scutil --dns 2>/dev/null | grep "nameserver\[0\]" | head -1 | awk '{print $3}' || echo "unknown")
|
||||
if [ "$SYS_DNS" != "unknown" ] && [ "$SYS_DNS" != "127.0.0.1" ]; then
|
||||
probe "ISP DNS ($SYS_DNS)" \
|
||||
"dig @$SYS_DNS google.com A +short +time=5 +tries=1" \
|
||||
"\."
|
||||
else
|
||||
printf " ${DIM}– System DNS is $SYS_DNS (skipped)${RESET}\n"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "==========================="
|
||||
TOTAL=$((PASSED + FAILED))
|
||||
printf "Results: ${GREEN}%d passed${RESET}, ${RED}%d blocked${RESET} / %d total\n" "$PASSED" "$FAILED" "$TOTAL"
|
||||
|
||||
echo ""
|
||||
echo "Recommendation:"
|
||||
if [ "$FAILED" -eq 0 ]; then
|
||||
echo " All transports available. Recursive mode will work."
|
||||
elif dig @198.41.0.4 . NS +short +time=5 +tries=1 2>&1 | grep -q "root-servers"; then
|
||||
echo " UDP:53 works. Recursive mode will work."
|
||||
else
|
||||
echo " UDP:53 blocked — recursive mode will NOT work on this network."
|
||||
if curl -s -m 5 'https://dns.quad9.net:443/dns-query?name=test.com&type=A' 2>&1 | grep -q "Answer"; then
|
||||
echo " DoH (port 443) works — use mode = \"forward\" with DoH upstream."
|
||||
elif echo Q | openssl s_client -connect 9.9.9.9:853 2>&1 | grep -q "verify return"; then
|
||||
echo " DoT (port 853) works — DoT upstream would work (not yet implemented)."
|
||||
else
|
||||
echo " Only ISP DNS available. Use mode = \"forward\" with ISP auto-detect."
|
||||
fi
|
||||
fi
|
||||
Reference in New Issue
Block a user