* feat: recursive resolution + full DNSSEC validation Numa becomes a true DNS resolver — resolves from root nameservers with complete DNSSEC chain-of-trust verification. Recursive resolution: - Iterative RFC 1034 from configurable root hints (13 default) - CNAME chasing (depth 8), referral following (depth 10) - A+AAAA glue extraction, IPv6 nameserver support - TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs - Config: mode = "recursive" in [upstream], root_hints, prime_tlds DNSSEC (all 4 phases): - EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020) - DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write - Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519 - Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326) - DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK) - RRSIG expiration/inception time validation - NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial - NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range - Authority RRSIG verification for denial proofs - Config: [dnssec] enabled/strict (default false, opt-in) - AD bit on Secure, SERVFAIL on Bogus+strict - DnssecStatus cached per entry, ValidationStats logging Performance: - TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY) - Referral DS piggybacking from authority sections - DNSKEY prefetch before validation loop - Cold-cache validation: ~1 DNSKEY fetch (down from 5) - Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns Also: - write_qname fix for root domain "." (was producing malformed queries) - write_record_header() dedup, write_bytes() bulk writes - DnsRecord::domain() + query_type() accessors - UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const - Real glue TTL (was hardcoded 3600) - DNSSEC restricted to recursive mode only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: TCP fallback, query minimization, UDP auto-disable Transport resilience for restrictive networks (ISPs blocking UDP:53): - DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry - UDP auto-disable: after 3 consecutive failures, switch to TCP-first - IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6) - Network change resets UDP detection for re-probing - Root hint rotation in TLD priming Privacy: - RFC 7816 query minimization: root servers see TLD only, not full name Code quality: - Merged find_starting_ns + find_starting_zone → find_closest_ns - Extracted resolve_ns_addrs_from_glue shared helper - Removed overall timeout wrapper (per-hop timeouts sufficient) - forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2) Testing: - Mock TCP-only DNS server for fallback tests (no network needed) - tcp_fallback_resolves_when_udp_blocked - tcp_only_iterative_resolution - tcp_fallback_handles_nxdomain - udp_auto_disable_resets - Integration test suite (4 suites, 51 tests) - Network probe script (tests/network-probe.sh) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: DNSSEC verified badge in dashboard query log - Add dnssec field to QueryLogEntry, track validation status per query - DnssecStatus::as_str() for API serialization - Dashboard shows green checkmark next to DNSSEC-verified responses - Blog post: add "How keys get there" section, transport resilience section, trim code blocks, update What's Next Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use SVG shield for DNSSEC badge, update blog HTML Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: NS cache lookup from authorities, UDP re-probe, shield alignment - find_closest_ns checks authorities (not just answers) for NS records, fixing TLD priming cache misses that caused redundant root queries - Periodic UDP re-probe every 5min when disabled — re-enables UDP after switching from a restrictive network to an open one - Dashboard DNSSEC shield uses fixed-width container for alignment - Blog post: tuck key-tag into trust anchor paragraph Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TCP single-write, mock server consistency, integration tests - TCP single-write fix: combine length prefix + message to avoid split segments that Microsoft/Azure DNS servers reject - Mock server (spawn_tcp_dns_server) updated to use single-write too - Tests: forward_tcp_wire_format, forward_tcp_single_segment_write - Integration: real-server checks for Microsoft/Office/Azure domains Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: recursive bar in dashboard, special-use domain interception Dashboard: - Add Recursive bar to resolution paths chart (cyan, distinct from Override) - Add RECURSIVE path tag style in query log Special-use domains (RFC 6761/6303/8880/9462): - .localhost → 127.0.0.1 (RFC 6761) - Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN - _dns.resolver.arpa (DDR) → NXDOMAIN - ipv4only.arpa (NAT64) → 192.0.0.170/171 - mDNS service discovery for private ranges → NXDOMAIN Eliminates ~900ms SERVFAILs for macOS system queries that were hitting root servers unnecessarily. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move generated blog HTML to site/blog/posts/, gitignore - Generated HTML now in site/blog/posts/ (gitignored) - CI workflow runs pandoc + make blog before deploy - Updated all internal blog links to /blog/posts/ path - blog/*.md remains the source of truth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review feedback — memory ordering, RRSIG time, NS resolution - Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES (ARM correctness for cross-thread coordination) - RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5) + 300s clock skew fudge factor (matches BIND) - resolve_ns_addrs_from_glue collects addresses from ALL NS names, not just the first with glue (improves failover) - is_special_use_domain: eliminate 16 format! allocations per .in-addr.arpa query (parse octet instead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: API endpoint tests, coverage target - 8 new axum handler tests: health, stats, query-log, overrides CRUD, cache, blocking stats, services CRUD, dashboard HTML - Tests use tower::oneshot — no network, no server startup - test_ctx() builds minimal ServerCtx for isolated testing - `make coverage` target (cargo-tarpaulin), separate from `make all` - 82 total tests (was 74) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
420 lines
10 KiB
Bash
Executable File
420 lines
10 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# Integration test suite for Numa
|
|
# Runs a test instance on port 5354, validates all features, exits with status.
|
|
# Usage: ./tests/integration.sh [release|debug]
|
|
|
|
set -euo pipefail
|
|
|
|
MODE="${1:-release}"
|
|
BINARY="./target/$MODE/numa"
|
|
PORT=5354
|
|
API_PORT=5381
|
|
CONFIG="/tmp/numa-integration-test.toml"
|
|
LOG="/tmp/numa-integration-test.log"
|
|
PASSED=0
|
|
FAILED=0
|
|
|
|
# Colors
|
|
GREEN="\033[32m"
|
|
RED="\033[31m"
|
|
DIM="\033[90m"
|
|
RESET="\033[0m"
|
|
|
|
check() {
|
|
local name="$1"
|
|
local expected="$2"
|
|
local actual="$3"
|
|
|
|
if echo "$actual" | grep -q "$expected"; then
|
|
PASSED=$((PASSED + 1))
|
|
printf " ${GREEN}✓${RESET} %s\n" "$name"
|
|
else
|
|
FAILED=$((FAILED + 1))
|
|
printf " ${RED}✗${RESET} %s\n" "$name"
|
|
printf " ${DIM}expected: %s${RESET}\n" "$expected"
|
|
printf " ${DIM} got: %s${RESET}\n" "$actual"
|
|
fi
|
|
}
|
|
|
|
# Build if needed
|
|
if [ ! -f "$BINARY" ]; then
|
|
echo "Building $MODE..."
|
|
cargo build --$MODE
|
|
fi
|
|
|
|
run_test_suite() {
|
|
local SUITE_NAME="$1"
|
|
local SUITE_CONFIG="$2"
|
|
|
|
cat > "$CONFIG" << CONF
|
|
$SUITE_CONFIG
|
|
CONF
|
|
|
|
echo "Starting Numa on :$PORT ($SUITE_NAME)..."
|
|
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
|
|
NUMA_PID=$!
|
|
sleep 4
|
|
|
|
if ! kill -0 "$NUMA_PID" 2>/dev/null; then
|
|
echo "Failed to start Numa:"
|
|
tail -5 "$LOG"
|
|
return 1
|
|
fi
|
|
|
|
DIG="dig @127.0.0.1 -p $PORT +time=5 +tries=1"
|
|
|
|
echo ""
|
|
echo "=== Resolution ==="
|
|
|
|
check "A record (google.com)" \
|
|
"." \
|
|
"$($DIG google.com A +short)"
|
|
|
|
check "AAAA record (google.com)" \
|
|
":" \
|
|
"$($DIG google.com AAAA +short)"
|
|
|
|
check "CNAME chasing (www.github.com)" \
|
|
"github.com" \
|
|
"$($DIG www.github.com A +short)"
|
|
|
|
check "MX records (gmail.com)" \
|
|
"gmail-smtp-in" \
|
|
"$($DIG gmail.com MX +short)"
|
|
|
|
check "NS records (cloudflare.com)" \
|
|
"cloudflare.com" \
|
|
"$($DIG cloudflare.com NS +short)"
|
|
|
|
check "NXDOMAIN" \
|
|
"NXDOMAIN" \
|
|
"$($DIG nope12345678.com A 2>&1 | grep status:)"
|
|
|
|
echo ""
|
|
echo "=== Ad Blocking ==="
|
|
|
|
if echo "$SUITE_CONFIG" | grep -q 'enabled = true'; then
|
|
check "Blocked domain → 0.0.0.0" \
|
|
"0.0.0.0" \
|
|
"$($DIG ads.google.com A +short)"
|
|
else
|
|
local ADS=$($DIG ads.google.com A +short 2>/dev/null)
|
|
if echo "$ADS" | grep -q "0.0.0.0"; then
|
|
check "Blocking disabled but domain blocked" "should-resolve" "0.0.0.0"
|
|
else
|
|
check "Blocking disabled — domain resolves normally" "." "$ADS"
|
|
fi
|
|
fi
|
|
|
|
echo ""
|
|
echo "=== Cache ==="
|
|
|
|
$DIG example.com A +short > /dev/null 2>&1
|
|
sleep 1
|
|
check "Cache hit returns result" \
|
|
"." \
|
|
"$($DIG example.com A +short)"
|
|
|
|
echo ""
|
|
echo "=== Connectivity ==="
|
|
|
|
# Apple captive portal can be slow/flaky on some networks
|
|
local CAPTIVE
|
|
CAPTIVE=$($DIG captive.apple.com A +short 2>/dev/null || echo "timeout")
|
|
if echo "$CAPTIVE" | grep -q "apple\|17\.\|timeout"; then
|
|
check "Apple captive portal" "." "$CAPTIVE"
|
|
else
|
|
check "Apple captive portal" "apple" "$CAPTIVE"
|
|
fi
|
|
|
|
check "CDN (jsdelivr)" \
|
|
"." \
|
|
"$($DIG cdn.jsdelivr.net A +short)"
|
|
|
|
echo ""
|
|
echo "=== API ==="
|
|
|
|
check "Health endpoint" \
|
|
"ok" \
|
|
"$(curl -s http://127.0.0.1:$API_PORT/health)"
|
|
|
|
check "Stats endpoint" \
|
|
"uptime_secs" \
|
|
"$(curl -s http://127.0.0.1:$API_PORT/stats)"
|
|
|
|
echo ""
|
|
echo "=== Log Health ==="
|
|
|
|
ERRORS=$(grep -c 'RECURSIVE ERROR\|PARSE ERROR\|HANDLER ERROR\|panic' "$LOG" 2>/dev/null || echo 0)
|
|
check "No critical errors in log" \
|
|
"0" \
|
|
"$ERRORS"
|
|
|
|
kill "$NUMA_PID" 2>/dev/null || true
|
|
wait "$NUMA_PID" 2>/dev/null || true
|
|
sleep 1
|
|
}
|
|
|
|
# ---- Suite 1: Recursive mode + DNSSEC ----
|
|
echo ""
|
|
echo "╔══════════════════════════════════════════╗"
|
|
echo "║ Suite 1: Recursive + DNSSEC + Blocking ║"
|
|
echo "╚══════════════════════════════════════════╝"
|
|
|
|
run_test_suite "recursive + DNSSEC + blocking" "
|
|
[server]
|
|
bind_addr = \"127.0.0.1:5354\"
|
|
api_port = 5381
|
|
|
|
[upstream]
|
|
mode = \"recursive\"
|
|
|
|
[cache]
|
|
max_entries = 10000
|
|
min_ttl = 60
|
|
max_ttl = 86400
|
|
|
|
[blocking]
|
|
enabled = true
|
|
|
|
[proxy]
|
|
enabled = false
|
|
|
|
[dnssec]
|
|
enabled = true
|
|
"
|
|
|
|
DIG="dig @127.0.0.1 -p $PORT +time=5 +tries=1"
|
|
|
|
echo ""
|
|
echo "=== DNSSEC (recursive only) ==="
|
|
|
|
# Re-start for DNSSEC checks (suite 1 instance was killed)
|
|
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
|
|
NUMA_PID=$!
|
|
sleep 4
|
|
|
|
check "AD bit set (cloudflare.com)" \
|
|
" ad" \
|
|
"$($DIG cloudflare.com A +dnssec 2>&1 | grep flags:)"
|
|
|
|
check "EDNS DO bit echoed" \
|
|
"flags: do" \
|
|
"$($DIG cloudflare.com A +dnssec 2>&1 | grep 'EDNS:')"
|
|
|
|
echo ""
|
|
echo "=== TCP wire format (real servers) ==="
|
|
|
|
# Microsoft's Azure DNS servers require length+message in a single TCP segment.
|
|
# This test catches the split-write bug that caused early-eof SERVFAILs.
|
|
check "Microsoft domain (update.code.visualstudio.com)" \
|
|
"NOERROR" \
|
|
"$($DIG update.code.visualstudio.com A 2>&1 | grep status:)"
|
|
|
|
check "Office domain (ecs.office.com)" \
|
|
"NOERROR" \
|
|
"$($DIG ecs.office.com A 2>&1 | grep status:)"
|
|
|
|
# Azure Application Insights — another strict TCP server
|
|
check "Azure telemetry (eastus2-3.in.applicationinsights.azure.com)" \
|
|
"." \
|
|
"$($DIG eastus2-3.in.applicationinsights.azure.com A +short 2>/dev/null || echo 'timeout')"
|
|
|
|
kill "$NUMA_PID" 2>/dev/null || true
|
|
wait "$NUMA_PID" 2>/dev/null || true
|
|
sleep 1
|
|
|
|
# ---- Suite 2: Forward mode (backward compat) ----
|
|
echo ""
|
|
echo "╔══════════════════════════════════════════╗"
|
|
echo "║ Suite 2: Forward (DoH) + Blocking ║"
|
|
echo "╚══════════════════════════════════════════╝"
|
|
|
|
run_test_suite "forward DoH + blocking" "
|
|
[server]
|
|
bind_addr = \"127.0.0.1:5354\"
|
|
api_port = 5381
|
|
|
|
[upstream]
|
|
mode = \"forward\"
|
|
address = \"https://9.9.9.9/dns-query\"
|
|
|
|
[cache]
|
|
max_entries = 10000
|
|
min_ttl = 60
|
|
max_ttl = 86400
|
|
|
|
[blocking]
|
|
enabled = true
|
|
|
|
[proxy]
|
|
enabled = false
|
|
"
|
|
|
|
# ---- Suite 3: Forward UDP (plain, no DoH) ----
|
|
echo ""
|
|
echo "╔══════════════════════════════════════════╗"
|
|
echo "║ Suite 3: Forward (UDP) + No Blocking ║"
|
|
echo "╚══════════════════════════════════════════╝"
|
|
|
|
run_test_suite "forward UDP, no blocking" "
|
|
[server]
|
|
bind_addr = \"127.0.0.1:5354\"
|
|
api_port = 5381
|
|
|
|
[upstream]
|
|
mode = \"forward\"
|
|
address = \"9.9.9.9\"
|
|
port = 53
|
|
|
|
[cache]
|
|
max_entries = 10000
|
|
min_ttl = 60
|
|
max_ttl = 86400
|
|
|
|
[blocking]
|
|
enabled = false
|
|
|
|
[proxy]
|
|
enabled = false
|
|
"
|
|
|
|
# Verify blocking is actually off
|
|
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
|
|
NUMA_PID=$!
|
|
sleep 3
|
|
|
|
echo ""
|
|
echo "=== Blocking disabled ==="
|
|
ADS_RESULT=$($DIG ads.google.com A +short 2>/dev/null)
|
|
if echo "$ADS_RESULT" | grep -q "0.0.0.0"; then
|
|
check "ads.google.com NOT blocked (blocking disabled)" "not-0.0.0.0" "0.0.0.0"
|
|
else
|
|
check "ads.google.com NOT blocked (blocking disabled)" "." "$ADS_RESULT"
|
|
fi
|
|
|
|
kill "$NUMA_PID" 2>/dev/null || true
|
|
wait "$NUMA_PID" 2>/dev/null || true
|
|
sleep 1
|
|
|
|
# ---- Suite 4: Local zones + Overrides API ----
|
|
echo ""
|
|
echo "╔══════════════════════════════════════════╗"
|
|
echo "║ Suite 4: Local Zones + Overrides API ║"
|
|
echo "╚══════════════════════════════════════════╝"
|
|
|
|
cat > "$CONFIG" << 'CONF'
|
|
[server]
|
|
bind_addr = "127.0.0.1:5354"
|
|
api_port = 5381
|
|
|
|
[upstream]
|
|
mode = "forward"
|
|
address = "9.9.9.9"
|
|
port = 53
|
|
|
|
[cache]
|
|
max_entries = 10000
|
|
|
|
[blocking]
|
|
enabled = false
|
|
|
|
[proxy]
|
|
enabled = false
|
|
|
|
[[zones]]
|
|
domain = "test.local"
|
|
record_type = "A"
|
|
value = "10.0.0.1"
|
|
ttl = 60
|
|
|
|
[[zones]]
|
|
domain = "mail.local"
|
|
record_type = "MX"
|
|
value = "10 smtp.local"
|
|
ttl = 60
|
|
CONF
|
|
|
|
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
|
|
NUMA_PID=$!
|
|
sleep 3
|
|
|
|
echo ""
|
|
echo "=== Local Zones ==="
|
|
|
|
check "Local A record (test.local)" \
|
|
"10.0.0.1" \
|
|
"$($DIG test.local A +short)"
|
|
|
|
check "Local MX record (mail.local)" \
|
|
"smtp.local" \
|
|
"$($DIG mail.local MX +short)"
|
|
|
|
check "Non-local domain still resolves" \
|
|
"." \
|
|
"$($DIG example.com A +short)"
|
|
|
|
echo ""
|
|
echo "=== Overrides API ==="
|
|
|
|
# Create override
|
|
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -X POST http://127.0.0.1:$API_PORT/overrides \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"domain":"override.test","target":"192.168.1.100","duration_secs":60}')
|
|
check "Create override (HTTP 200/201)" \
|
|
"20" \
|
|
"$HTTP_CODE"
|
|
|
|
sleep 1
|
|
|
|
check "Override resolves" \
|
|
"192.168.1.100" \
|
|
"$($DIG override.test A +short)"
|
|
|
|
# List overrides
|
|
check "List overrides" \
|
|
"override.test" \
|
|
"$(curl -s http://127.0.0.1:$API_PORT/overrides)"
|
|
|
|
# Delete override
|
|
curl -s -X DELETE http://127.0.0.1:$API_PORT/overrides/override.test > /dev/null
|
|
|
|
sleep 1
|
|
|
|
# After delete, should not resolve to override
|
|
AFTER_DELETE=$($DIG override.test A +short 2>/dev/null)
|
|
if echo "$AFTER_DELETE" | grep -q "192.168.1.100"; then
|
|
check "Override deleted" "not-192.168.1.100" "$AFTER_DELETE"
|
|
else
|
|
check "Override deleted" "." "deleted"
|
|
fi
|
|
|
|
echo ""
|
|
echo "=== Cache API ==="
|
|
|
|
check "Cache list" \
|
|
"domain" \
|
|
"$(curl -s http://127.0.0.1:$API_PORT/cache)"
|
|
|
|
# Flush cache
|
|
curl -s -X DELETE http://127.0.0.1:$API_PORT/cache > /dev/null
|
|
check "Cache flushed" \
|
|
"0" \
|
|
"$(curl -s http://127.0.0.1:$API_PORT/stats | grep -o '"entries":[0-9]*' | grep -o '[0-9]*')"
|
|
|
|
kill "$NUMA_PID" 2>/dev/null || true
|
|
wait "$NUMA_PID" 2>/dev/null || true
|
|
|
|
# Summary
|
|
echo ""
|
|
TOTAL=$((PASSED + FAILED))
|
|
if [ "$FAILED" -eq 0 ]; then
|
|
printf "${GREEN}All %d tests passed.${RESET}\n" "$TOTAL"
|
|
exit 0
|
|
else
|
|
printf "${RED}%d/%d tests failed.${RESET}\n" "$FAILED" "$TOTAL"
|
|
echo ""
|
|
echo "Log: $LOG"
|
|
exit 1
|
|
fi
|