Files
numa/tests/network-probe.sh
Razvan Dimescu a84f2e7f1d feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation

Numa becomes a true DNS resolver — resolves from root nameservers
with complete DNSSEC chain-of-trust verification.

Recursive resolution:
- Iterative RFC 1034 from configurable root hints (13 default)
- CNAME chasing (depth 8), referral following (depth 10)
- A+AAAA glue extraction, IPv6 nameserver support
- TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs
- Config: mode = "recursive" in [upstream], root_hints, prime_tlds

DNSSEC (all 4 phases):
- EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020)
- DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write
- Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519
- Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326)
- DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK)
- RRSIG expiration/inception time validation
- NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial
- NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range
- Authority RRSIG verification for denial proofs
- Config: [dnssec] enabled/strict (default false, opt-in)
- AD bit on Secure, SERVFAIL on Bogus+strict
- DnssecStatus cached per entry, ValidationStats logging

Performance:
- TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY)
- Referral DS piggybacking from authority sections
- DNSKEY prefetch before validation loop
- Cold-cache validation: ~1 DNSKEY fetch (down from 5)
- Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns

Also:
- write_qname fix for root domain "." (was producing malformed queries)
- write_record_header() dedup, write_bytes() bulk writes
- DnsRecord::domain() + query_type() accessors
- UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const
- Real glue TTL (was hardcoded 3600)
- DNSSEC restricted to recursive mode only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: TCP fallback, query minimization, UDP auto-disable

Transport resilience for restrictive networks (ISPs blocking UDP:53):
- DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry
- UDP auto-disable: after 3 consecutive failures, switch to TCP-first
- IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6)
- Network change resets UDP detection for re-probing
- Root hint rotation in TLD priming

Privacy:
- RFC 7816 query minimization: root servers see TLD only, not full name

Code quality:
- Merged find_starting_ns + find_starting_zone → find_closest_ns
- Extracted resolve_ns_addrs_from_glue shared helper
- Removed overall timeout wrapper (per-hop timeouts sufficient)
- forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2)

Testing:
- Mock TCP-only DNS server for fallback tests (no network needed)
- tcp_fallback_resolves_when_udp_blocked
- tcp_only_iterative_resolution
- tcp_fallback_handles_nxdomain
- udp_auto_disable_resets
- Integration test suite (4 suites, 51 tests)
- Network probe script (tests/network-probe.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DNSSEC verified badge in dashboard query log

- Add dnssec field to QueryLogEntry, track validation status per query
- DnssecStatus::as_str() for API serialization
- Dashboard shows green checkmark next to DNSSEC-verified responses
- Blog post: add "How keys get there" section, transport resilience section,
  trim code blocks, update What's Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SVG shield for DNSSEC badge, update blog HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: NS cache lookup from authorities, UDP re-probe, shield alignment

- find_closest_ns checks authorities (not just answers) for NS records,
  fixing TLD priming cache misses that caused redundant root queries
- Periodic UDP re-probe every 5min when disabled — re-enables UDP
  after switching from a restrictive network to an open one
- Dashboard DNSSEC shield uses fixed-width container for alignment
- Blog post: tuck key-tag into trust anchor paragraph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TCP single-write, mock server consistency, integration tests

- TCP single-write fix: combine length prefix + message to avoid split
  segments that Microsoft/Azure DNS servers reject
- Mock server (spawn_tcp_dns_server) updated to use single-write too
- Tests: forward_tcp_wire_format, forward_tcp_single_segment_write
- Integration: real-server checks for Microsoft/Office/Azure domains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: recursive bar in dashboard, special-use domain interception

Dashboard:
- Add Recursive bar to resolution paths chart (cyan, distinct from Override)
- Add RECURSIVE path tag style in query log

Special-use domains (RFC 6761/6303/8880/9462):
- .localhost → 127.0.0.1 (RFC 6761)
- Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN
- _dns.resolver.arpa (DDR) → NXDOMAIN
- ipv4only.arpa (NAT64) → 192.0.0.170/171
- mDNS service discovery for private ranges → NXDOMAIN

Eliminates ~900ms SERVFAILs for macOS system queries that were
hitting root servers unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move generated blog HTML to site/blog/posts/, gitignore

- Generated HTML now in site/blog/posts/ (gitignored)
- CI workflow runs pandoc + make blog before deploy
- Updated all internal blog links to /blog/posts/ path
- blog/*.md remains the source of truth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — memory ordering, RRSIG time, NS resolution

- Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES
  (ARM correctness for cross-thread coordination)
- RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5)
  + 300s clock skew fudge factor (matches BIND)
- resolve_ns_addrs_from_glue collects addresses from ALL NS names,
  not just the first with glue (improves failover)
- is_special_use_domain: eliminate 16 format! allocations per
  .in-addr.arpa query (parse octet instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: API endpoint tests, coverage target

- 8 new axum handler tests: health, stats, query-log, overrides CRUD,
  cache, blocking stats, services CRUD, dashboard HTML
- Tests use tower::oneshot — no network, no server startup
- test_ctx() builds minimal ServerCtx for isolated testing
- `make coverage` target (cargo-tarpaulin), separate from `make all`
- 82 total tests (was 74)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 04:03:47 +02:00

129 lines
4.2 KiB
Bash
Executable File
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
#!/usr/bin/env bash
# Network probe: tests which DNS transports are available on the current network.
# Run on a problematic network to diagnose what's blocked.
# Usage: ./tests/network-probe.sh
set -euo pipefail
GREEN="\033[32m"
RED="\033[31m"
DIM="\033[90m"
RESET="\033[0m"
PASSED=0
FAILED=0
probe() {
local name="$1"
local cmd="$2"
local expect="$3"
local result
result=$(eval "$cmd" 2>&1) || true
if echo "$result" | grep -q "$expect"; then
PASSED=$((PASSED + 1))
printf " ${GREEN}${RESET} %-45s ${DIM}%s${RESET}\n" "$name" "$(echo "$result" | head -1 | cut -c1-60)"
else
FAILED=$((FAILED + 1))
printf " ${RED}${RESET} %-45s ${DIM}blocked/timeout${RESET}\n" "$name"
fi
}
echo ""
echo "Network DNS Transport Probe"
echo "==========================="
echo "Network: $(networksetup -getairportnetwork en0 2>/dev/null | sed 's/Current Wi-Fi Network: //' || echo 'unknown')"
echo "Local IP: $(ipconfig getifaddr en0 2>/dev/null || echo 'unknown')"
echo "Gateway: $(route -n get default 2>/dev/null | grep gateway | awk '{print $2}' || echo 'unknown')"
echo ""
echo "=== UDP port 53 (recursive resolution) ==="
probe "Root server a (198.41.0.4)" \
"dig @198.41.0.4 . NS +short +time=5 +tries=1" \
"root-servers"
probe "Root server k (193.0.14.129)" \
"dig @193.0.14.129 . NS +short +time=5 +tries=1" \
"root-servers"
probe "Google DNS (8.8.8.8)" \
"dig @8.8.8.8 google.com A +short +time=5 +tries=1" \
"\."
probe "Cloudflare (1.1.1.1)" \
"dig @1.1.1.1 cloudflare.com A +short +time=5 +tries=1" \
"\."
probe ".com TLD (192.5.6.30)" \
"dig @192.5.6.30 google.com NS +short +time=5 +tries=1" \
"google"
echo ""
echo "=== TCP port 53 ==="
probe "Google DNS TCP (8.8.8.8)" \
"dig @8.8.8.8 google.com A +short +tcp +time=5 +tries=1" \
"\."
probe "Root server TCP (198.41.0.4)" \
"dig @198.41.0.4 . NS +short +tcp +time=5 +tries=1" \
"root-servers"
echo ""
echo "=== DoT port 853 (DNS-over-TLS) ==="
probe "Quad9 DoT (9.9.9.9:853)" \
"echo Q | openssl s_client -connect 9.9.9.9:853 -servername dns.quad9.net 2>&1 | grep 'verify return'" \
"verify return"
probe "Cloudflare DoT (1.1.1.1:853)" \
"echo Q | openssl s_client -connect 1.1.1.1:853 -servername cloudflare-dns.com 2>&1 | grep 'verify return'" \
"verify return"
echo ""
echo "=== DoH port 443 (DNS-over-HTTPS) ==="
probe "Quad9 DoH (dns.quad9.net)" \
"curl -s -m 5 -H 'accept: application/dns-json' 'https://dns.quad9.net:443/dns-query?name=google.com&type=A'" \
"Answer"
probe "Cloudflare DoH (1.1.1.1)" \
"curl -s -m 5 -H 'accept: application/dns-json' 'https://1.1.1.1/dns-query?name=google.com&type=A'" \
"Answer"
probe "Google DoH (dns.google)" \
"curl -s -m 5 'https://dns.google/resolve?name=google.com&type=A'" \
"Answer"
echo ""
echo "=== ISP DNS ==="
# Detect system DNS
SYS_DNS=$(scutil --dns 2>/dev/null | grep "nameserver\[0\]" | head -1 | awk '{print $3}' || echo "unknown")
if [ "$SYS_DNS" != "unknown" ] && [ "$SYS_DNS" != "127.0.0.1" ]; then
probe "ISP DNS ($SYS_DNS)" \
"dig @$SYS_DNS google.com A +short +time=5 +tries=1" \
"\."
else
printf " ${DIM} System DNS is $SYS_DNS (skipped)${RESET}\n"
fi
echo ""
echo "==========================="
TOTAL=$((PASSED + FAILED))
printf "Results: ${GREEN}%d passed${RESET}, ${RED}%d blocked${RESET} / %d total\n" "$PASSED" "$FAILED" "$TOTAL"
echo ""
echo "Recommendation:"
if [ "$FAILED" -eq 0 ]; then
echo " All transports available. Recursive mode will work."
elif dig @198.41.0.4 . NS +short +time=5 +tries=1 2>&1 | grep -q "root-servers"; then
echo " UDP:53 works. Recursive mode will work."
else
echo " UDP:53 blocked — recursive mode will NOT work on this network."
if curl -s -m 5 'https://dns.quad9.net:443/dns-query?name=test.com&type=A' 2>&1 | grep -q "Answer"; then
echo " DoH (port 443) works — use mode = \"forward\" with DoH upstream."
elif echo Q | openssl s_client -connect 9.9.9.9:853 2>&1 | grep -q "verify return"; then
echo " DoT (port 853) works — DoT upstream would work (not yet implemented)."
else
echo " Only ISP DNS available. Use mode = \"forward\" with ISP auto-detect."
fi
fi