feat(odoh): ship ODoH client + self-hosted relay (RFC 9230)

Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.

Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.

Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.

Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
This commit is contained in:
Razvan Dimescu
2026-04-20 12:34:04 +03:00
parent f6cfb3ce1b
commit 241c40553b
14 changed files with 1911 additions and 46 deletions

View File

@@ -105,6 +105,7 @@ pub async fn resolve_query(
// Pipeline: overrides -> .localhost -> local zones -> special-use (unless forwarded)
// -> .tld proxy -> blocklist -> cache -> forwarding -> recursive/upstream
// Each lock is scoped to avoid holding MutexGuard across await points.
let mut upstream_transport: Option<crate::stats::UpstreamTransport> = None;
let (response, path, dnssec) = {
let override_record = ctx.overrides.read().unwrap().lookup(&qname);
if let Some(record) = override_record {
@@ -208,6 +209,7 @@ pub async fn resolve_query(
{
// Conditional forwarding takes priority over recursive mode
// (e.g. Tailscale .ts.net, VPC private zones)
upstream_transport = pool.preferred().map(|u| u.transport());
match forward_with_failover_raw(
raw_wire,
pool,
@@ -241,6 +243,9 @@ pub async fn resolve_query(
}
}
} else if ctx.upstream_mode == UpstreamMode::Recursive {
// Recursive resolution makes UDP hops to roots/TLDs/auths;
// tag as Udp so the dashboard can aggregate plaintext-wire
// egress honestly. Only mark on success — errors stay None.
let key = (qname.clone(), qtype);
let (resp, path, err) = resolve_coalesced(&ctx.inflight, key, &query, || {
crate::recursive::resolve_recursive(
@@ -263,6 +268,8 @@ pub async fn resolve_query(
qname,
err.as_deref().unwrap_or("leader failed")
);
} else {
upstream_transport = Some(crate::stats::UpstreamTransport::Udp);
}
(resp, path, DnssecStatus::Indeterminate)
} else {
@@ -277,7 +284,10 @@ pub async fn resolve_query(
.await
{
Ok(resp_wire) => match cache_and_parse(ctx, &qname, qtype, &resp_wire) {
Ok(resp) => (resp, QueryPath::Upstream, DnssecStatus::Indeterminate),
Ok(resp) => {
upstream_transport = pool.preferred().map(|u| u.transport());
(resp, QueryPath::Upstream, DnssecStatus::Indeterminate)
}
Err(e) => {
error!("{} | {:?} {} | PARSE ERROR | {}", src_addr, qtype, qname, e);
(
@@ -397,7 +407,7 @@ pub async fn resolve_query(
// Record stats and query log
{
let mut s = ctx.stats.lock().unwrap();
let total = s.record(path, transport);
let total = s.record(path, transport, upstream_transport);
if total.is_multiple_of(1000) {
s.log_summary();
}