feat: recursive DNS + DNSSEC + TCP fallback (#17)

* feat: recursive resolution + full DNSSEC validation

Numa becomes a true DNS resolver — resolves from root nameservers
with complete DNSSEC chain-of-trust verification.

Recursive resolution:
- Iterative RFC 1034 from configurable root hints (13 default)
- CNAME chasing (depth 8), referral following (depth 10)
- A+AAAA glue extraction, IPv6 nameserver support
- TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs
- Config: mode = "recursive" in [upstream], root_hints, prime_tlds

DNSSEC (all 4 phases):
- EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020)
- DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write
- Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519
- Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326)
- DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK)
- RRSIG expiration/inception time validation
- NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial
- NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range
- Authority RRSIG verification for denial proofs
- Config: [dnssec] enabled/strict (default false, opt-in)
- AD bit on Secure, SERVFAIL on Bogus+strict
- DnssecStatus cached per entry, ValidationStats logging

Performance:
- TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY)
- Referral DS piggybacking from authority sections
- DNSKEY prefetch before validation loop
- Cold-cache validation: ~1 DNSKEY fetch (down from 5)
- Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns

Also:
- write_qname fix for root domain "." (was producing malformed queries)
- write_record_header() dedup, write_bytes() bulk writes
- DnsRecord::domain() + query_type() accessors
- UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const
- Real glue TTL (was hardcoded 3600)
- DNSSEC restricted to recursive mode only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: TCP fallback, query minimization, UDP auto-disable

Transport resilience for restrictive networks (ISPs blocking UDP:53):
- DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry
- UDP auto-disable: after 3 consecutive failures, switch to TCP-first
- IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6)
- Network change resets UDP detection for re-probing
- Root hint rotation in TLD priming

Privacy:
- RFC 7816 query minimization: root servers see TLD only, not full name

Code quality:
- Merged find_starting_ns + find_starting_zone → find_closest_ns
- Extracted resolve_ns_addrs_from_glue shared helper
- Removed overall timeout wrapper (per-hop timeouts sufficient)
- forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2)

Testing:
- Mock TCP-only DNS server for fallback tests (no network needed)
- tcp_fallback_resolves_when_udp_blocked
- tcp_only_iterative_resolution
- tcp_fallback_handles_nxdomain
- udp_auto_disable_resets
- Integration test suite (4 suites, 51 tests)
- Network probe script (tests/network-probe.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DNSSEC verified badge in dashboard query log

- Add dnssec field to QueryLogEntry, track validation status per query
- DnssecStatus::as_str() for API serialization
- Dashboard shows green checkmark next to DNSSEC-verified responses
- Blog post: add "How keys get there" section, transport resilience section,
  trim code blocks, update What's Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SVG shield for DNSSEC badge, update blog HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: NS cache lookup from authorities, UDP re-probe, shield alignment

- find_closest_ns checks authorities (not just answers) for NS records,
  fixing TLD priming cache misses that caused redundant root queries
- Periodic UDP re-probe every 5min when disabled — re-enables UDP
  after switching from a restrictive network to an open one
- Dashboard DNSSEC shield uses fixed-width container for alignment
- Blog post: tuck key-tag into trust anchor paragraph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TCP single-write, mock server consistency, integration tests

- TCP single-write fix: combine length prefix + message to avoid split
  segments that Microsoft/Azure DNS servers reject
- Mock server (spawn_tcp_dns_server) updated to use single-write too
- Tests: forward_tcp_wire_format, forward_tcp_single_segment_write
- Integration: real-server checks for Microsoft/Office/Azure domains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: recursive bar in dashboard, special-use domain interception

Dashboard:
- Add Recursive bar to resolution paths chart (cyan, distinct from Override)
- Add RECURSIVE path tag style in query log

Special-use domains (RFC 6761/6303/8880/9462):
- .localhost → 127.0.0.1 (RFC 6761)
- Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN
- _dns.resolver.arpa (DDR) → NXDOMAIN
- ipv4only.arpa (NAT64) → 192.0.0.170/171
- mDNS service discovery for private ranges → NXDOMAIN

Eliminates ~900ms SERVFAILs for macOS system queries that were
hitting root servers unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move generated blog HTML to site/blog/posts/, gitignore

- Generated HTML now in site/blog/posts/ (gitignored)
- CI workflow runs pandoc + make blog before deploy
- Updated all internal blog links to /blog/posts/ path
- blog/*.md remains the source of truth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — memory ordering, RRSIG time, NS resolution

- Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES
  (ARM correctness for cross-thread coordination)
- RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5)
  + 300s clock skew fudge factor (matches BIND)
- resolve_ns_addrs_from_glue collects addresses from ALL NS names,
  not just the first with glue (improves failover)
- is_special_use_domain: eliminate 16 format! allocations per
  .in-addr.arpa query (parse octet instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: API endpoint tests, coverage target

- 8 new axum handler tests: health, stats, query-log, overrides CRUD,
  cache, blocking stats, services CRUD, dashboard HTML
- Tests use tower::oneshot — no network, no server startup
- test_ctx() builds minimal ServerCtx for isolated testing
- `make coverage` target (cargo-tarpaulin), separate from `make all`
- 82 total tests (was 74)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit was merged in pull request #17.
This commit is contained in:
Razvan Dimescu
2026-03-28 04:03:47 +02:00
committed by GitHub
parent cc8d3c7a83
commit b6703b4315
31 changed files with 5477 additions and 776 deletions

View File

@@ -27,6 +27,8 @@ pub struct Config {
pub services: Vec<ServiceConfig>,
#[serde(default)]
pub lan: LanConfig,
#[serde(default)]
pub dnssec: DnssecConfig,
}
#[derive(Deserialize)]
@@ -61,26 +63,112 @@ fn default_api_port() -> u16 {
5380
}
#[derive(Deserialize, Default, PartialEq, Eq, Clone, Copy)]
#[serde(rename_all = "lowercase")]
pub enum UpstreamMode {
#[default]
Forward,
Recursive,
}
#[derive(Deserialize)]
pub struct UpstreamConfig {
#[serde(default)]
pub mode: UpstreamMode,
#[serde(default = "default_upstream_addr")]
pub address: String,
#[serde(default = "default_upstream_port")]
pub port: u16,
#[serde(default = "default_timeout_ms")]
pub timeout_ms: u64,
#[serde(default = "default_root_hints")]
pub root_hints: Vec<String>,
#[serde(default = "default_prime_tlds")]
pub prime_tlds: Vec<String>,
}
impl Default for UpstreamConfig {
fn default() -> Self {
UpstreamConfig {
mode: UpstreamMode::default(),
address: default_upstream_addr(),
port: default_upstream_port(),
timeout_ms: default_timeout_ms(),
root_hints: default_root_hints(),
prime_tlds: default_prime_tlds(),
}
}
}
fn default_prime_tlds() -> Vec<String> {
vec![
// gTLDs
"com".into(),
"net".into(),
"org".into(),
"info".into(),
"io".into(),
"dev".into(),
"app".into(),
"xyz".into(),
"me".into(),
// EU + European ccTLDs
"eu".into(),
"uk".into(),
"de".into(),
"fr".into(),
"nl".into(),
"it".into(),
"es".into(),
"pl".into(),
"se".into(),
"no".into(),
"dk".into(),
"fi".into(),
"at".into(),
"be".into(),
"ie".into(),
"pt".into(),
"cz".into(),
"ro".into(),
"gr".into(),
"hu".into(),
"bg".into(),
"hr".into(),
"sk".into(),
"si".into(),
"lt".into(),
"lv".into(),
"ee".into(),
"ch".into(),
"is".into(),
// Other major ccTLDs
"co".into(),
"br".into(),
"au".into(),
"ca".into(),
"jp".into(),
]
}
fn default_root_hints() -> Vec<String> {
vec![
"198.41.0.4".into(), // a.root-servers.net
"199.9.14.201".into(), // b.root-servers.net
"192.33.4.12".into(), // c.root-servers.net
"199.7.91.13".into(), // d.root-servers.net
"192.203.230.10".into(), // e.root-servers.net
"192.5.5.241".into(), // f.root-servers.net
"192.112.36.4".into(), // g.root-servers.net
"198.97.190.53".into(), // h.root-servers.net
"192.36.148.17".into(), // i.root-servers.net
"192.58.128.30".into(), // j.root-servers.net
"193.0.14.129".into(), // k.root-servers.net
"199.7.83.42".into(), // l.root-servers.net
"202.12.27.33".into(), // m.root-servers.net
]
}
fn default_upstream_addr() -> String {
String::new() // empty = auto-detect from system resolver
}
@@ -88,7 +176,7 @@ fn default_upstream_port() -> u16 {
53
}
fn default_timeout_ms() -> u64 {
3000
5000
}
#[derive(Deserialize)]
@@ -250,6 +338,14 @@ fn default_lan_peer_timeout() -> u64 {
90
}
#[derive(Deserialize, Clone, Default)]
pub struct DnssecConfig {
#[serde(default)]
pub enabled: bool,
#[serde(default)]
pub strict: bool,
}
#[cfg(test)]
mod tests {
use super::*;