Files
numa/src/ctx.rs
Razvan Dimescu b6703b4315 feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation

Numa becomes a true DNS resolver — resolves from root nameservers
with complete DNSSEC chain-of-trust verification.

Recursive resolution:
- Iterative RFC 1034 from configurable root hints (13 default)
- CNAME chasing (depth 8), referral following (depth 10)
- A+AAAA glue extraction, IPv6 nameserver support
- TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs
- Config: mode = "recursive" in [upstream], root_hints, prime_tlds

DNSSEC (all 4 phases):
- EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020)
- DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write
- Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519
- Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326)
- DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK)
- RRSIG expiration/inception time validation
- NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial
- NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range
- Authority RRSIG verification for denial proofs
- Config: [dnssec] enabled/strict (default false, opt-in)
- AD bit on Secure, SERVFAIL on Bogus+strict
- DnssecStatus cached per entry, ValidationStats logging

Performance:
- TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY)
- Referral DS piggybacking from authority sections
- DNSKEY prefetch before validation loop
- Cold-cache validation: ~1 DNSKEY fetch (down from 5)
- Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns

Also:
- write_qname fix for root domain "." (was producing malformed queries)
- write_record_header() dedup, write_bytes() bulk writes
- DnsRecord::domain() + query_type() accessors
- UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const
- Real glue TTL (was hardcoded 3600)
- DNSSEC restricted to recursive mode only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: TCP fallback, query minimization, UDP auto-disable

Transport resilience for restrictive networks (ISPs blocking UDP:53):
- DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry
- UDP auto-disable: after 3 consecutive failures, switch to TCP-first
- IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6)
- Network change resets UDP detection for re-probing
- Root hint rotation in TLD priming

Privacy:
- RFC 7816 query minimization: root servers see TLD only, not full name

Code quality:
- Merged find_starting_ns + find_starting_zone → find_closest_ns
- Extracted resolve_ns_addrs_from_glue shared helper
- Removed overall timeout wrapper (per-hop timeouts sufficient)
- forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2)

Testing:
- Mock TCP-only DNS server for fallback tests (no network needed)
- tcp_fallback_resolves_when_udp_blocked
- tcp_only_iterative_resolution
- tcp_fallback_handles_nxdomain
- udp_auto_disable_resets
- Integration test suite (4 suites, 51 tests)
- Network probe script (tests/network-probe.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DNSSEC verified badge in dashboard query log

- Add dnssec field to QueryLogEntry, track validation status per query
- DnssecStatus::as_str() for API serialization
- Dashboard shows green checkmark next to DNSSEC-verified responses
- Blog post: add "How keys get there" section, transport resilience section,
  trim code blocks, update What's Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SVG shield for DNSSEC badge, update blog HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: NS cache lookup from authorities, UDP re-probe, shield alignment

- find_closest_ns checks authorities (not just answers) for NS records,
  fixing TLD priming cache misses that caused redundant root queries
- Periodic UDP re-probe every 5min when disabled — re-enables UDP
  after switching from a restrictive network to an open one
- Dashboard DNSSEC shield uses fixed-width container for alignment
- Blog post: tuck key-tag into trust anchor paragraph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TCP single-write, mock server consistency, integration tests

- TCP single-write fix: combine length prefix + message to avoid split
  segments that Microsoft/Azure DNS servers reject
- Mock server (spawn_tcp_dns_server) updated to use single-write too
- Tests: forward_tcp_wire_format, forward_tcp_single_segment_write
- Integration: real-server checks for Microsoft/Office/Azure domains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: recursive bar in dashboard, special-use domain interception

Dashboard:
- Add Recursive bar to resolution paths chart (cyan, distinct from Override)
- Add RECURSIVE path tag style in query log

Special-use domains (RFC 6761/6303/8880/9462):
- .localhost → 127.0.0.1 (RFC 6761)
- Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN
- _dns.resolver.arpa (DDR) → NXDOMAIN
- ipv4only.arpa (NAT64) → 192.0.0.170/171
- mDNS service discovery for private ranges → NXDOMAIN

Eliminates ~900ms SERVFAILs for macOS system queries that were
hitting root servers unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move generated blog HTML to site/blog/posts/, gitignore

- Generated HTML now in site/blog/posts/ (gitignored)
- CI workflow runs pandoc + make blog before deploy
- Updated all internal blog links to /blog/posts/ path
- blog/*.md remains the source of truth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — memory ordering, RRSIG time, NS resolution

- Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES
  (ARM correctness for cross-thread coordination)
- RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5)
  + 300s clock skew fudge factor (matches BIND)
- resolve_ns_addrs_from_glue collects addresses from ALL NS names,
  not just the first with glue (improves failover)
- is_special_use_domain: eliminate 16 format! allocations per
  .in-addr.arpa query (parse octet instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: API endpoint tests, coverage target

- 8 new axum handler tests: health, stats, query-log, overrides CRUD,
  cache, blocking stats, services CRUD, dashboard HTML
- Tests use tower::oneshot — no network, no server startup
- test_ctx() builds minimal ServerCtx for isolated testing
- `make coverage` target (cargo-tarpaulin), separate from `make all`
- 82 total tests (was 74)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 04:03:47 +02:00

405 lines
15 KiB
Rust

use std::net::SocketAddr;
use std::path::PathBuf;
use std::sync::{Mutex, RwLock};
use std::time::{Duration, Instant, SystemTime};
use arc_swap::ArcSwap;
use log::{debug, error, info, warn};
use rustls::ServerConfig;
use tokio::net::UdpSocket;
use crate::blocklist::BlocklistStore;
use crate::buffer::BytePacketBuffer;
use crate::cache::{DnsCache, DnssecStatus};
use crate::config::{UpstreamMode, ZoneMap};
use crate::forward::{forward_query, Upstream};
use crate::header::ResultCode;
use crate::lan::PeerStore;
use crate::override_store::OverrideStore;
use crate::packet::DnsPacket;
use crate::query_log::{QueryLog, QueryLogEntry};
use crate::question::QueryType;
use crate::record::DnsRecord;
use crate::service_store::ServiceStore;
use crate::stats::{QueryPath, ServerStats};
use crate::system_dns::ForwardingRule;
pub struct ServerCtx {
pub socket: UdpSocket,
pub zone_map: ZoneMap,
/// std::sync::RwLock (not tokio) — locks must never be held across .await points.
pub cache: RwLock<DnsCache>,
pub stats: Mutex<ServerStats>,
pub overrides: RwLock<OverrideStore>,
pub blocklist: RwLock<BlocklistStore>,
pub query_log: Mutex<QueryLog>,
pub services: Mutex<ServiceStore>,
pub lan_peers: Mutex<PeerStore>,
pub forwarding_rules: Vec<ForwardingRule>,
pub upstream: Mutex<Upstream>,
pub upstream_auto: bool,
pub upstream_port: u16,
pub lan_ip: Mutex<std::net::Ipv4Addr>,
pub timeout: Duration,
pub proxy_tld: String,
pub proxy_tld_suffix: String, // pre-computed ".{tld}" to avoid per-query allocation
pub lan_enabled: bool,
pub config_path: String,
pub config_found: bool,
pub config_dir: PathBuf,
pub data_dir: PathBuf,
pub tls_config: Option<ArcSwap<ServerConfig>>,
pub upstream_mode: UpstreamMode,
pub root_hints: Vec<SocketAddr>,
pub dnssec_enabled: bool,
pub dnssec_strict: bool,
}
pub async fn handle_query(
mut buffer: BytePacketBuffer,
src_addr: SocketAddr,
ctx: &ServerCtx,
) -> crate::Result<()> {
let start = Instant::now();
let query = match DnsPacket::from_buffer(&mut buffer) {
Ok(packet) => packet,
Err(e) => {
warn!("{} | PARSE ERROR | {}", src_addr, e);
return Ok(());
}
};
let (qname, qtype) = match query.questions.first() {
Some(q) => (q.name.clone(), q.qtype),
None => return Ok(()),
};
// Pipeline: overrides -> .tld interception -> blocklist -> local zones -> cache -> upstream
// Each lock is scoped to avoid holding MutexGuard across await points.
let (response, path, dnssec) = {
let override_record = ctx.overrides.read().unwrap().lookup(&qname);
if let Some(record) = override_record {
let mut resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
resp.answers.push(record);
(resp, QueryPath::Overridden, DnssecStatus::Indeterminate)
} else if qname == "localhost" || qname.ends_with(".localhost") {
// RFC 6761: .localhost always resolves to loopback
let mut resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
match qtype {
QueryType::AAAA => resp.answers.push(DnsRecord::AAAA {
domain: qname.clone(),
addr: std::net::Ipv6Addr::LOCALHOST,
ttl: 300,
}),
_ => resp.answers.push(DnsRecord::A {
domain: qname.clone(),
addr: std::net::Ipv4Addr::LOCALHOST,
ttl: 300,
}),
}
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else if is_special_use_domain(&qname) {
// RFC 6761/8880: private PTR, DDR, NAT64 — answer locally
let resp = special_use_response(&query, &qname, qtype);
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else if !ctx.proxy_tld_suffix.is_empty()
&& (qname.ends_with(&ctx.proxy_tld_suffix) || qname == ctx.proxy_tld)
{
// Resolve .numa: local services → 127.0.0.1, LAN peers → peer IP
let service_name = qname.strip_suffix(&ctx.proxy_tld_suffix).unwrap_or(&qname);
let resolve_ip = {
let local = ctx.services.lock().unwrap();
if local.lookup(service_name).is_some() {
std::net::Ipv4Addr::LOCALHOST
} else {
let mut peers = ctx.lan_peers.lock().unwrap();
peers
.lookup(service_name)
.and_then(|(ip, _)| match ip {
std::net::IpAddr::V4(v4) => Some(v4),
_ => None,
})
.unwrap_or(std::net::Ipv4Addr::LOCALHOST)
}
};
let mut resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
match qtype {
QueryType::AAAA => resp.answers.push(DnsRecord::AAAA {
domain: qname.clone(),
addr: if resolve_ip == std::net::Ipv4Addr::LOCALHOST {
std::net::Ipv6Addr::LOCALHOST
} else {
resolve_ip.to_ipv6_mapped()
},
ttl: 300,
}),
_ => resp.answers.push(DnsRecord::A {
domain: qname.clone(),
addr: resolve_ip,
ttl: 300,
}),
}
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else if ctx.blocklist.read().unwrap().is_blocked(&qname) {
let mut resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
match qtype {
QueryType::AAAA => resp.answers.push(DnsRecord::AAAA {
domain: qname.clone(),
addr: std::net::Ipv6Addr::UNSPECIFIED,
ttl: 60,
}),
_ => resp.answers.push(DnsRecord::A {
domain: qname.clone(),
addr: std::net::Ipv4Addr::UNSPECIFIED,
ttl: 60,
}),
}
(resp, QueryPath::Blocked, DnssecStatus::Indeterminate)
} else if let Some(records) = ctx.zone_map.get(qname.as_str()).and_then(|m| m.get(&qtype)) {
let mut resp = DnsPacket::response_from(&query, ResultCode::NOERROR);
resp.answers = records.clone();
(resp, QueryPath::Local, DnssecStatus::Indeterminate)
} else {
let cached = ctx.cache.read().unwrap().lookup_with_status(&qname, qtype);
if let Some((cached, cached_dnssec)) = cached {
let mut resp = cached;
resp.header.id = query.header.id;
if cached_dnssec == DnssecStatus::Secure {
resp.header.authed_data = true;
}
(resp, QueryPath::Cached, cached_dnssec)
} else if ctx.upstream_mode == UpstreamMode::Recursive {
match crate::recursive::resolve_recursive(
&qname,
qtype,
&ctx.cache,
&query,
&ctx.root_hints,
)
.await
{
Ok(resp) => (resp, QueryPath::Recursive, DnssecStatus::Indeterminate),
Err(e) => {
error!(
"{} | {:?} {} | RECURSIVE ERROR | {}",
src_addr, qtype, qname, e
);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
}
} else {
let upstream =
match crate::system_dns::match_forwarding_rule(&qname, &ctx.forwarding_rules) {
Some(addr) => Upstream::Udp(addr),
None => ctx.upstream.lock().unwrap().clone(),
};
match forward_query(&query, &upstream, ctx.timeout).await {
Ok(resp) => {
ctx.cache.write().unwrap().insert(&qname, qtype, &resp);
(resp, QueryPath::Forwarded, DnssecStatus::Indeterminate)
}
Err(e) => {
error!(
"{} | {:?} {} | UPSTREAM ERROR | {}",
src_addr, qtype, qname, e
);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
}
}
}
};
let client_do = query.edns.as_ref().is_some_and(|e| e.do_bit);
let mut response = response;
// DNSSEC validation (recursive/forwarded responses only)
let mut dnssec = dnssec;
if ctx.dnssec_enabled && path == QueryPath::Recursive {
let (status, vstats) =
crate::dnssec::validate_response(&response, &ctx.cache, &ctx.root_hints).await;
debug!(
"DNSSEC | {} | {:?} | {}ms | dnskey_hit={} dnskey_fetch={} ds_hit={} ds_fetch={}",
qname,
status,
vstats.elapsed_ms,
vstats.dnskey_cache_hits,
vstats.dnskey_fetches,
vstats.ds_cache_hits,
vstats.ds_fetches,
);
dnssec = status;
if status == DnssecStatus::Secure {
response.header.authed_data = true;
}
if status == DnssecStatus::Bogus && ctx.dnssec_strict {
response = DnsPacket::response_from(&query, ResultCode::SERVFAIL);
}
ctx.cache
.write()
.unwrap()
.insert_with_status(&qname, qtype, &response, status);
}
// Strip DNSSEC records if client didn't set DO bit
if !client_do {
strip_dnssec_records(&mut response);
}
// Echo EDNS back if client sent it
if query.edns.is_some() {
response.edns = Some(crate::packet::EdnsOpt {
do_bit: client_do,
..Default::default()
});
}
let elapsed = start.elapsed();
info!(
"{} | {:?} {} | {} | {} | {}ms",
src_addr,
qtype,
qname,
path.as_str(),
response.header.rescode.as_str(),
elapsed.as_millis(),
);
debug!(
"response: {} answers, {} authorities, {} resources",
response.answers.len(),
response.authorities.len(),
response.resources.len(),
);
let mut resp_buffer = BytePacketBuffer::new();
if response.write(&mut resp_buffer).is_err() {
// Response too large for UDP — set TC bit and send header + question only
debug!("response too large, setting TC bit for {}", qname);
let mut tc_response = DnsPacket::response_from(&query, response.header.rescode);
tc_response.header.truncated_message = true;
let mut tc_buffer = BytePacketBuffer::new();
tc_response.write(&mut tc_buffer)?;
ctx.socket.send_to(tc_buffer.filled(), src_addr).await?;
} else {
ctx.socket.send_to(resp_buffer.filled(), src_addr).await?;
}
// Record stats and query log
{
let mut s = ctx.stats.lock().unwrap();
let total = s.record(path);
if total.is_multiple_of(1000) {
s.log_summary();
}
}
ctx.query_log.lock().unwrap().push(QueryLogEntry {
timestamp: SystemTime::now(),
src_addr,
domain: qname,
query_type: qtype,
path,
rescode: response.header.rescode,
latency_us: elapsed.as_micros() as u64,
dnssec,
});
Ok(())
}
fn is_dnssec_record(r: &DnsRecord) -> bool {
matches!(
r.query_type(),
QueryType::RRSIG | QueryType::DNSKEY | QueryType::DS | QueryType::NSEC | QueryType::NSEC3
)
}
fn strip_dnssec_records(pkt: &mut DnsPacket) {
pkt.answers.retain(|r| !is_dnssec_record(r));
pkt.authorities.retain(|r| !is_dnssec_record(r));
pkt.resources.retain(|r| !is_dnssec_record(r));
}
fn is_special_use_domain(qname: &str) -> bool {
if qname.ends_with(".in-addr.arpa") {
// RFC 6303: private + loopback + link-local reverse DNS
if qname.ends_with(".10.in-addr.arpa")
|| qname.ends_with(".168.192.in-addr.arpa")
|| qname.ends_with(".127.in-addr.arpa")
|| qname.ends_with(".254.169.in-addr.arpa")
|| qname.ends_with(".0.in-addr.arpa")
|| qname.contains("_dns-sd._udp")
{
return true;
}
// 172.16-31.x.x (RFC 1918) — extract second octet from reverse name
if qname.ends_with(".172.in-addr.arpa") {
if let Some(octet_str) = qname
.strip_suffix(".172.in-addr.arpa")
.and_then(|s| s.rsplit('.').next())
{
if let Ok(octet) = octet_str.parse::<u8>() {
return (16..=31).contains(&octet);
}
}
}
return false;
}
// DDR (RFC 9462)
if qname == "_dns.resolver.arpa" || qname.ends_with("._dns.resolver.arpa") {
return true;
}
// NAT64 (RFC 8880)
qname == "ipv4only.arpa"
}
fn special_use_response(query: &DnsPacket, qname: &str, qtype: QueryType) -> DnsPacket {
use std::net::{Ipv4Addr, Ipv6Addr};
if qname == "ipv4only.arpa" {
// RFC 8880: well-known NAT64 addresses
let mut resp = DnsPacket::response_from(query, ResultCode::NOERROR);
let domain = qname.to_string();
match qtype {
QueryType::A => {
resp.answers.push(DnsRecord::A {
domain: domain.clone(),
addr: Ipv4Addr::new(192, 0, 0, 170),
ttl: 300,
});
resp.answers.push(DnsRecord::A {
domain,
addr: Ipv4Addr::new(192, 0, 0, 171),
ttl: 300,
});
}
QueryType::AAAA => {
resp.answers.push(DnsRecord::AAAA {
domain,
addr: Ipv6Addr::new(0x0064, 0xff9b, 0, 0, 0, 0, 0xc000, 0x00aa),
ttl: 300,
});
}
_ => {}
}
resp
} else {
DnsPacket::response_from(query, ResultCode::NXDOMAIN)
}
}