fix(bootstrap): route numa HTTPS via IP-literal bootstrap resolver (#122)

When numa is its own system DNS resolver (HAOS add-on, Pi-hole-style
container, /etc/resolv.conf → 127.0.0.1), every numa-originated HTTPS
connection — DoH upstream, ODoH relay/target, blocklist CDN — routed
its hostname through getaddrinfo() back to numa itself. Cold boot
deadlocked; steady state taxed every new TCP connection. 0.14.1's
retry-with-backoff masked the startup race but not the underlying
self-loop.

NumaResolver implements reqwest::dns::Resolve with two lanes:
- Per-host overrides (ODoH relay_ip/target_ip) short-circuit DNS
  entirely, preserving ODoH's zero-plain-DNS-leak property.
- Otherwise: A+AAAA in parallel via UDP to IP-literal bootstrap
  servers, with TCP fallback for UDP-hostile networks.

Bootstrap IPs come from upstream.fallback (IP-literal filtered,
hostnames skipped with a warning). Empty fallback yields the
hardcoded default [9.9.9.9, 1.1.1.1]; the chosen source is logged
at startup so the silent default is visible.

doh_keepalive_loop now fires its first tick immediately, and
keepalive_doh logs failures at WARN — bootstrap issues surface
within ~100ms of boot instead of on the first client query.

Distinct from UpstreamPool.fallback (client-query failover) which
stays untouched: client queries with no configured fallback still
SERVFAIL on primary failure rather than silently shadow-routing.

Reproducer: tests/docker/self-resolver-loop.sh. Before: 0 blocklist
domains, 3072ms SERVFAIL. After: 397k domains, 118ms NOERROR.
This commit is contained in:
Razvan Dimescu
2026-04-21 16:19:14 +03:00
parent 31adc31c9b
commit 10469e96bd
8 changed files with 505 additions and 48 deletions

View File

@@ -383,7 +383,7 @@ fn run_default(rt: &tokio::runtime::Runtime) {
/// Library-to-library: Numa forward_query_raw vs Hickory resolver.lookup.
fn run_direct(rt: &tokio::runtime::Runtime) {
let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse");
let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let resolver = rt.block_on(build_hickory_resolver());
let timeout = Duration::from_secs(10);
@@ -609,9 +609,9 @@ fn run_hedge_multi(rt: &tokio::runtime::Runtime, iterations: usize) {
DOMAINS.len()
);
let primary = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse");
let primary_dual = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse");
let secondary_dual = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse");
let primary = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let primary_dual = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let secondary_dual = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let resolver = rt.block_on(build_hickory_resolver());
println!("Warming up...");
@@ -810,7 +810,7 @@ fn run_diag(rt: &tokio::runtime::Runtime) {
fn run_diag_clients(rt: &tokio::runtime::Runtime) {
println!("Client diagnostic: reqwest vs Hickory (20 queries to {DOH_UPSTREAM})\n");
let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse");
let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let resolver = rt.block_on(build_hickory_resolver());
let timeout = Duration::from_secs(10);