32 Commits

Author SHA1 Message Date
Razvan Dimescu
c787de1548 chore: bump version to 0.14.2 2026-04-22 23:57:37 +03:00
Razvan Dimescu
e6e79273b9 Revert "chore: bump version to 0.15.0"
This reverts commit 3ec3b40830.
2026-04-22 23:57:28 +03:00
Razvan Dimescu
3ec3b40830 chore: bump version to 0.15.0 2026-04-22 23:50:20 +03:00
Razvan Dimescu
90fa79bc0f Merge pull request #135 from razvandimescu/fix/hedge-default-off
fix(upstream): default hedge_ms=0 to avoid silent 2x upstream query count
2026-04-22 23:49:15 +03:00
Razvan Dimescu
b8a125b598 fix(upstream): default hedge_ms=0 to avoid silent 2x upstream query count
Hedging fires a second upstream query against the same upstream after
the hedge delay. Rescues packet loss and handshake stalls on flaky
links, but every lookup shows up twice at the provider — silently
halves the headroom for anyone on a quota'd upstream (NextDNS free tier,
Control D, paid Quad9).

Surfaced by #134 (bcookatpcsd), who saw every query duplicated on the
NextDNS dashboard with a single-address DoT upstream. Not a bug — the
feature doing what it says on the tin — but a surprising default.

Flipping the default to 0 makes hedging explicitly opt-in. Users who
want tail-latency rescue on flaky nets add `hedge_ms = 10` (or higher).
No config migration needed; no breaking changes to the API surface.

Also tightens the numa.toml comment so the trade-off is visible at
config time, not retroactively on a provider dashboard.
2026-04-22 23:30:55 +03:00
Razvan Dimescu
bc30be94e7 Merge pull request #131 from razvandimescu/feat/packaging-client-docker
feat(packaging): ODoH client Docker deploy recipe
2026-04-22 23:11:50 +03:00
Razvan Dimescu
26b1cd5917 feat(packaging): ODoH client Docker deploy
Single-container docker-compose recipe for running numa in ODoH client
mode. Ships with a starter numa.toml pointing at odoh-relay.numa.rs
paired with Cloudflare's ODoH target — two independent operators with
distinct eTLD+1s, so the default passes numa's same-operator check.

Exposes :53 UDP+TCP for LAN clients and :5380 for the dashboard + REST
API. README covers prerequisites, deploy, verification, and the ODoH
privacy boundary (relay sees IP, target sees query, neither sees both).

Advertised alongside packaging/relay/ in the main README Docker section.
2026-04-22 18:05:46 +03:00
Razvan Dimescu
77d6d89f80 Merge pull request #130 from razvandimescu/docs/numa-toml-odoh-examples
docs(config): ODoH upstream examples with relay_ip/target_ip pinning
2026-04-22 17:20:19 +03:00
Razvan Dimescu
4fdd05f284 Merge pull request #132 from razvandimescu/chore/site-live-reload
chore(site): live-reload dev server
2026-04-22 17:17:37 +03:00
Razvan Dimescu
2e461ccc0f docs(config): add ODoH upstream examples with relay_ip/target_ip pinning
Complements the bootstrap resolver fix (#122, #126) by documenting the
ODoH knobs in the commented config template. Explains relay_ip/target_ip
as the way to prevent plain-DNS leaks of the relay/target hostnames via
the bootstrap resolver on cold boot when numa is its own system DNS.
2026-04-22 17:13:13 +03:00
Razvan Dimescu
bf84c44346 Merge pull request #133 from razvandimescu/chore/cargo-audit-rustls-webpki
chore: bump rustls-webpki to 0.103.13 (RUSTSEC-2026-0104)
2026-04-22 17:03:58 +03:00
Razvan Dimescu
df2062882c chore: bump rustls-webpki to 0.103.13 for RUSTSEC-2026-0104
Advisory published 2026-04-22: reachable panic in certificate revocation
list parsing. Patch is a lockfile-only bump — transitive via rustls, no
direct dep changes. Unblocks cargo audit in CI across all open PRs.
2026-04-22 16:42:10 +03:00
Razvan Dimescu
76dda89078 Merge pull request #129 from razvandimescu/chore/gitignore-claude
chore: gitignore .claude/ harness state
2026-04-22 16:39:56 +03:00
Razvan Dimescu
640b64bf7e chore(site): live-reload dev server via chokidar + browser-sync
Replaces the plain python3 http.server + one-shot make blog with a
watcher pipeline: chokidar regenerates HTML on MD/template changes,
browser-sync serves the site and reloads the browser on rendered-asset
changes. First run downloads both via npx; subsequent runs are instant.

Preflight checks for npx and pandoc. Port arg parsing is tolerant of
legacy --drafts flag ordering (drafts are always included now, since
that's what the dev loop actually wants).

Cleanup trap kills the watcher on exit so re-runs don't leave orphans.
2026-04-22 15:50:21 +03:00
Razvan Dimescu
5ba19e04c8 chore: gitignore local Claude Code harness state
.claude/ holds per-session harness files (settings.local.json, task
locks, worktree metadata). None of it belongs in the repo.
2026-04-22 15:49:58 +03:00
Razvan Dimescu
c98afafaa1 Merge pull request #127 from razvandimescu/refactor/bootstrap-btreemap
refactor(bootstrap): BTreeMap for overrides + simplify review
2026-04-21 18:41:49 +03:00
Razvan Dimescu
5cba02a6c8 refactor(bootstrap): BTreeMap for overrides + simplify review
- Switch overrides from HashMap to BTreeMap — deterministic iteration by
  type, drops the manual sort when logging.
- Rename the flat_map closure's inner `ips` to `addrs` to stop shadowing
  the outer Vec<String>.
- Trim the Suite 8 TEST-NET-1 comment to keep the "why" and drop
  mechanism narration.
- Drop a redundant sleep 1 after wait — wait already blocks on exit.
2026-04-21 18:37:35 +03:00
Razvan Dimescu
46a95d58aa Merge pull request #126 from razvandimescu/fix/self-resolver-loop
fix(bootstrap): route numa HTTPS via IP-literal bootstrap resolver (#122)
2026-04-21 17:52:51 +03:00
Razvan Dimescu
51cce0347b test(odoh): integration-verify relay_ip/target_ip override wiring
Suite 8 now ends with a config using RFC 5737 TEST-NET-1 IPs as
relay_ip/target_ip, started briefly so the bootstrap resolver logs its
override map. Asserts both host=IP pairs land in that map — closing the
gap flagged on PR #126 (zero-plain-DNS-leak for ODoH endpoints was only
unit-tested).

Also: NumaResolver::new now logs the override map at INFO when non-empty,
so operators can verify their ODoH bootstrap without needing DEBUG level.
2026-04-21 17:43:02 +03:00
Razvan Dimescu
459395203d style: cargo fmt 2026-04-21 16:30:26 +03:00
Razvan Dimescu
10469e96bd fix(bootstrap): route numa HTTPS via IP-literal bootstrap resolver (#122)
When numa is its own system DNS resolver (HAOS add-on, Pi-hole-style
container, /etc/resolv.conf → 127.0.0.1), every numa-originated HTTPS
connection — DoH upstream, ODoH relay/target, blocklist CDN — routed
its hostname through getaddrinfo() back to numa itself. Cold boot
deadlocked; steady state taxed every new TCP connection. 0.14.1's
retry-with-backoff masked the startup race but not the underlying
self-loop.

NumaResolver implements reqwest::dns::Resolve with two lanes:
- Per-host overrides (ODoH relay_ip/target_ip) short-circuit DNS
  entirely, preserving ODoH's zero-plain-DNS-leak property.
- Otherwise: A+AAAA in parallel via UDP to IP-literal bootstrap
  servers, with TCP fallback for UDP-hostile networks.

Bootstrap IPs come from upstream.fallback (IP-literal filtered,
hostnames skipped with a warning). Empty fallback yields the
hardcoded default [9.9.9.9, 1.1.1.1]; the chosen source is logged
at startup so the silent default is visible.

doh_keepalive_loop now fires its first tick immediately, and
keepalive_doh logs failures at WARN — bootstrap issues surface
within ~100ms of boot instead of on the first client query.

Distinct from UpstreamPool.fallback (client-query failover) which
stays untouched: client queries with no configured fallback still
SERVFAIL on primary failure rather than silently shadow-routing.

Reproducer: tests/docker/self-resolver-loop.sh. Before: 0 blocklist
domains, 3072ms SERVFAIL. After: 397k domains, 118ms NOERROR.
2026-04-21 16:19:14 +03:00
Razvan Dimescu
31adc31c9b refactor(ctx): coalesce forward-path upstream queries
resolve_coalesced now takes leader_path: QueryPath and applies to all
three upstream branches (Forwarded-rule, Recursive, Upstream), not just
Recursive. Fixes thundering-herd at boot when N concurrent HTTPS setups
each trigger independent forward queries for the same upstream hostname.
2026-04-21 16:18:52 +03:00
Razvan Dimescu
60600b045f chore: bump version to 0.14.1 2026-04-20 19:27:06 +03:00
Razvan Dimescu
3e6bf3feb0 Merge pull request #125 from razvandimescu/worktree-fix-blocklist-bootstrap
fix(blocklist): retry on transient download failures (#122)
2026-04-20 19:22:04 +03:00
Razvan Dimescu
8bed7c4649 test(blocklist): decouple retry tests from RETRY_DELAYS_SECS length
Derive both the flaky-server drop count and the zero-delay schedule
from RETRY_DELAYS_SECS.len() so the tests keep exercising their
intended invariants — "succeeds on final attempt" and "gives up after
all attempts fail" — if the production retry schedule ever changes.

Also: rename fail_first → drop_first_n to match drop(sock); swap the
giveup test's empty body for an "unreachable" sentinel so a regression
that accidentally served couldn't silently match Some("").
2026-04-20 19:19:43 +03:00
Razvan Dimescu
5b1642c6dc fix(blocklist): retry on transient download failures (#122)
On cold start, reqwest's getaddrinfo can race numa's own first-query
cold-path latency — resolver timeout fires before numa warms its
upstream DoH connection. Wrap each blocklist fetch in 3 retries with
2s/10s/30s backoff; by the second attempt, the upstream is warm and
subsequent getaddrinfos succeed in <100ms.

Also: parallelize fetches across lists via join_all (different hosts,
no warming dependency), walk the full error source chain so reqwest
failures surface the underlying cause, and parameterize retry delays
for unit-test speed.
2026-04-20 19:19:43 +03:00
Razvan Dimescu
01fda7891e Merge pull request #123 from razvandimescu/feat/odoh-etld1-check
feat(odoh): reject relay+target sharing an eTLD+1
2026-04-20 19:06:12 +03:00
Razvan Dimescu
5e84adbd94 Merge pull request #124 from razvandimescu/fix/dashboard-encryption-pct-args
fix(dashboard): pass missing args to encryptionPct in refresh()
2026-04-20 19:05:50 +03:00
Razvan Dimescu
15978a7859 fix(dashboard): pass missing args to encryptionPct in refresh()
Commit eb5ea3b generalised encryptionPct from (transport) to
(data, encryptedKeys, allKeys) and updated renderTransport and
renderUpstreamWire, but missed the call inside render() that computes
the inline `~N/s · M% enc` QPS tag. With undefined allKeys, the
first .reduce() threw TypeError and the render try/catch silently
downgraded the whole dashboard to "disconnected" — every panel left
empty even though /stats was returning real data.

Fix the call site to match the other two (inbound-wire keys) and have
the catch log to console so the next silent-failure regression shows
up in DevTools within seconds instead of a source dive.
2026-04-20 19:04:15 +03:00
Razvan Dimescu
193b38b85f feat(odoh): reject relay+target sharing an eTLD+1
Plain host-string equality caught the copy-paste-same-URL footgun but
let `r.cloudflare.com` + `odoh.cloudflare.com` through — two subdomains
of the same operator collapse ODoH to ordinary DoH. Add a second layer:
compare registrable domains via the PSL (`psl` crate) after the exact-
host check. Fails open on IP literals and unparseable hosts; the exact-
host check still runs in those cases.
2026-04-20 18:46:54 +03:00
Razvan Dimescu
4c685d1602 docs(readme): pamper readme still 2026-04-20 17:19:16 +03:00
Razvan Dimescu
cd6e686a1a docs(readme): surface ODoH in the intro paragraph
Adds the v0.14.0 capability where it's most differentiating: the first
paragraph (sealed-query framing alongside the existing ad-blocking and
.numa-domain pitches) and the second paragraph (numa relay as a public
ODoH endpoint, with the DNSCrypt-list supply-doubling angle as fact).

No reposition: tagline and structure unchanged. ODoH joins the
existing capability set rather than displacing it. Hero GIF stays;
will be re-recorded once the dashboard's Outbound Wire panel is worth
showing in motion.
2026-04-20 17:14:21 +03:00
20 changed files with 1061 additions and 188 deletions

1
.gitignore vendored
View File

@@ -1,6 +1,7 @@
/target /target
/build-dir /build-dir
CLAUDE.md CLAUDE.md
.claude/
docs/ docs/
site/blog/posts/ site/blog/posts/
ios/ ios/

22
Cargo.lock generated
View File

@@ -1547,7 +1547,7 @@ dependencies = [
[[package]] [[package]]
name = "numa" name = "numa"
version = "0.14.0" version = "0.14.2"
dependencies = [ dependencies = [
"arc-swap", "arc-swap",
"axum", "axum",
@@ -1562,6 +1562,7 @@ dependencies = [
"hyper-util", "hyper-util",
"log", "log",
"odoh-rs", "odoh-rs",
"psl",
"qrcode", "qrcode",
"rand_core 0.9.5", "rand_core 0.9.5",
"rcgen", "rcgen",
@@ -1802,6 +1803,21 @@ dependencies = [
"unicode-ident", "unicode-ident",
] ]
[[package]]
name = "psl"
version = "2.1.203"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "76c0777260d32b76a8c3c197646707085d37e79d63b5872a29192c8d4f60f50b"
dependencies = [
"psl-types",
]
[[package]]
name = "psl-types"
version = "2.0.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "33cb294fe86a74cbcf50d4445b37da762029549ebeea341421c7c70370f86cac"
[[package]] [[package]]
name = "qrcode" name = "qrcode"
version = "0.14.1" version = "0.14.1"
@@ -2114,9 +2130,9 @@ dependencies = [
[[package]] [[package]]
name = "rustls-webpki" name = "rustls-webpki"
version = "0.103.12" version = "0.103.13"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8279bb85272c9f10811ae6a6c547ff594d6a7f3c6c6b02ee9726d1d0dcfcdd06" checksum = "61c429a8649f110dddef65e2a5ad240f747e85f7758a6bccc7e5777bd33f756e"
dependencies = [ dependencies = [
"aws-lc-rs", "aws-lc-rs",
"ring", "ring",

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "numa" name = "numa"
version = "0.14.0" version = "0.14.2"
authors = ["razvandimescu <razvan@dimescu.com>"] authors = ["razvandimescu <razvan@dimescu.com>"]
edition = "2021" edition = "2021"
description = "Portable DNS resolver in Rust — .numa local domains, ad blocking, developer overrides, DNS-over-HTTPS" description = "Portable DNS resolver in Rust — .numa local domains, ad blocking, developer overrides, DNS-over-HTTPS"
@@ -30,6 +30,7 @@ tokio-rustls = "0.26"
arc-swap = "1" arc-swap = "1"
ring = "0.17" ring = "0.17"
odoh-rs = "1" odoh-rs = "1"
psl = "2"
# rand_core 0.9 matches the version odoh-rs (via hpke 0.13) depends on, so we # rand_core 0.9 matches the version odoh-rs (via hpke 0.13) depends on, so we
# share one RngCore trait and OsRng impl across the dep tree. # share one RngCore trait and OsRng impl across the dep tree.
rand_core = { version = "0.9", features = ["os_rng"] } rand_core = { version = "0.9", features = ["os_rng"] }

View File

@@ -6,9 +6,9 @@
**DNS you own. Everywhere you go.** — [numa.rs](https://numa.rs) **DNS you own. Everywhere you go.** — [numa.rs](https://numa.rs)
A portable DNS resolver in a single binary. Block ads on any network, name your local services (`frontend.numa`), and override any hostname with auto-revert — all from your laptop, no cloud account or Raspberry Pi required. A portable DNS resolver in a single binary. Block ads on any network, name your local services (`frontend.numa`), override any hostname with auto-revert, and seal every outbound query with **ODoH (RFC 9230)** so no single party sees both who you are and what you asked — all from your laptop, no cloud account or Raspberry Pi required.
Built from scratch in Rust. Zero DNS libraries. RFC 1035 wire protocol parsed by hand. Caching, ad blocking, and local service domains out of the box. Optional recursive resolution from root nameservers with full DNSSEC chain-of-trust validation, plus a DNS-over-TLS listener for encrypted client connections (iOS Private DNS, systemd-resolved, etc.). One ~8MB binary, everything embedded. Built from scratch in Rust. Zero DNS libraries. Caching, ad blocking, and local service domains out of the box. Optional recursive resolution from root nameservers with full DNSSEC chain-of-trust validation, plus a DNS-over-TLS listener for encrypted client connections (iOS Private DNS, systemd-resolved, etc.). Run `numa relay` and the same binary becomes a public ODoH endpoint too — the curated DNSCrypt list currently has one surviving relay, so every Numa deploy materially expands the ecosystem. One ~8MB binary, everything embedded.
![Numa dashboard](assets/hero-demo.gif) ![Numa dashboard](assets/hero-demo.gif)
@@ -125,6 +125,10 @@ docker run -d --name numa --network host \
Multi-arch: `linux/amd64` and `linux/arm64`. Multi-arch: `linux/amd64` and `linux/arm64`.
Turnkey compose recipes:
- [`packaging/client/`](packaging/client/) — ODoH client mode (anonymous DNS), Numa + starter `numa.toml`.
- [`packaging/relay/`](packaging/relay/) — public ODoH relay, Numa + Caddy + ACME.
## How It Compares ## How It Compares
| | Pi-hole | AdGuard Home | Unbound | Numa | | | Pi-hole | AdGuard Home | Unbound | Numa |

View File

@@ -383,7 +383,7 @@ fn run_default(rt: &tokio::runtime::Runtime) {
/// Library-to-library: Numa forward_query_raw vs Hickory resolver.lookup. /// Library-to-library: Numa forward_query_raw vs Hickory resolver.lookup.
fn run_direct(rt: &tokio::runtime::Runtime) { fn run_direct(rt: &tokio::runtime::Runtime) {
let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse"); let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let resolver = rt.block_on(build_hickory_resolver()); let resolver = rt.block_on(build_hickory_resolver());
let timeout = Duration::from_secs(10); let timeout = Duration::from_secs(10);
@@ -609,9 +609,11 @@ fn run_hedge_multi(rt: &tokio::runtime::Runtime, iterations: usize) {
DOMAINS.len() DOMAINS.len()
); );
let primary = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse"); let primary = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let primary_dual = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse"); let primary_dual =
let secondary_dual = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse"); numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let secondary_dual =
numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let resolver = rt.block_on(build_hickory_resolver()); let resolver = rt.block_on(build_hickory_resolver());
println!("Warming up..."); println!("Warming up...");
@@ -810,7 +812,7 @@ fn run_diag(rt: &tokio::runtime::Runtime) {
fn run_diag_clients(rt: &tokio::runtime::Runtime) { fn run_diag_clients(rt: &tokio::runtime::Runtime) {
println!("Client diagnostic: reqwest vs Hickory (20 queries to {DOH_UPSTREAM})\n"); println!("Client diagnostic: reqwest vs Hickory (20 queries to {DOH_UPSTREAM})\n");
let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443).expect("failed to parse"); let upstream = numa::forward::parse_upstream(DOH_UPSTREAM, 443, None).expect("failed to parse");
let resolver = rt.block_on(build_hickory_resolver()); let resolver = rt.block_on(build_hickory_resolver());
let timeout = Duration::from_secs(10); let timeout = Duration::from_secs(10);

View File

@@ -22,6 +22,7 @@ api_port = 5380
# [upstream] # [upstream]
# mode = "forward" # "forward" (default) — relay to upstream # mode = "forward" # "forward" (default) — relay to upstream
# # "recursive" — resolve from root hints (no address needed) # # "recursive" — resolve from root hints (no address needed)
# # "odoh" — Oblivious DoH (see ODoH block below)
# address = "9.9.9.9" # single upstream (plain UDP) # address = "9.9.9.9" # single upstream (plain UDP)
# address = ["192.168.1.1", "9.9.9.9:5353"] # multiple upstreams — SRTT picks fastest # address = ["192.168.1.1", "9.9.9.9:5353"] # multiple upstreams — SRTT picks fastest
# address = "https://dns.quad9.net/dns-query" # DNS-over-HTTPS (encrypted) # address = "https://dns.quad9.net/dns-query" # DNS-over-HTTPS (encrypted)
@@ -29,11 +30,29 @@ api_port = 5380
# fallback = ["8.8.8.8", "1.1.1.1"] # tried only when all primaries fail # fallback = ["8.8.8.8", "1.1.1.1"] # tried only when all primaries fail
# port = 53 # default port for addresses without :port # port = 53 # default port for addresses without :port
# timeout_ms = 3000 # timeout_ms = 3000
# hedge_ms = 10 # request hedging delay (ms). After this delay # hedge_ms = 0 # request hedging delay (ms). Default: 0 (off).
# # without a response, fires a parallel request # # Set to e.g. 10 to fire a parallel upstream
# # to the same upstream. Rescues packet loss (UDP), # # request after 10ms of silence — rescues packet
# # dispatch spikes (DoH), TLS stalls (DoT). # # loss (UDP), dispatch spikes (DoH), TLS stalls
# # Set to 0 to disable. Default: 10 # # (DoT). Doubles the upstream query count, so
# # leave off for quota'd providers (NextDNS,
# # Control D).
# ODoH (Oblivious DNS-over-HTTPS, RFC 9230). The relay sees your IP but
# not the question; the target sees the question but not your IP. Numa
# refuses same-operator relay+target configs by default (eTLD+1 check).
# [upstream]
# mode = "odoh"
# relay = "https://odoh-relay.numa.rs/proxy"
# target = "https://odoh.cloudflare-dns.com/dns-query"
# strict = true # default: refuse to downgrade to `fallback`
# # on relay failure. Set false to allow a
# # non-oblivious fallback path.
# relay_ip = "178.104.229.30" # optional: pin IPs so numa doesn't leak the
# target_ip = "104.16.249.249" # relay/target hostnames via the bootstrap
# # resolver on cold boot when numa is its
# # own system DNS. See docs/implementation/
# # bootstrap-resolver.md.
# root_hints = [ # only used in recursive mode # root_hints = [ # only used in recursive mode
# "198.41.0.4", # a.root-servers.net (Verisign) # "198.41.0.4", # a.root-servers.net (Verisign)
# "199.9.14.201", # b.root-servers.net (USC-ISI) # "199.9.14.201", # b.root-servers.net (USC-ISI)

View File

@@ -0,0 +1,72 @@
# Numa ODoH Client — Docker deploy
Single-container deploy that runs Numa as an ODoH (RFC 9230) client: every
DNS query routes through an independent relay + target so neither operator
sees both your IP and your question. See the [ODoH integration doc][odoh]
for the full protocol and privacy trade-offs.
[odoh]: ../../docs/implementation/odoh-integration.md
## Prerequisites
- Docker + Docker Compose v2.
- Port 53 (UDP+TCP) free on the host — Numa listens there for DNS
clients on your LAN.
## Configure
The shipped `numa.toml` points at Numa's own public relay
(`odoh-relay.numa.rs`) paired with Cloudflare's ODoH target
(`odoh.cloudflare-dns.com`). That's two independent operators with
distinct eTLD+1s — the default configuration passes Numa's same-operator
check and works out of the box.
To use a different relay or target, edit `numa.toml` and adjust the URLs.
The `relay` and `target` must resolve to distinct operators or Numa
refuses to start.
## Deploy
```sh
docker compose up -d
docker compose logs -f numa # watch startup
```
The first query fires the bootstrap resolver + ODoH config fetch;
subsequent queries reuse the warm HTTP/2 connection.
## Point your devices at it
Set each device's DNS server to the IP of the Docker host. For a LAN-wide
rollout, set the DNS server in your router's DHCP config so every device
picks it up automatically.
Verify a query landed on the ODoH path:
```sh
dig @<host-ip> example.com
curl http://<host-ip>:5380/stats | jq '.upstream_transport.odoh'
```
`upstream_transport.odoh` should increment on each query.
## What this does NOT buy you
ODoH protects the *path*, not the content:
- **The target (Cloudflare here) still sees the question.** It just
doesn't know it's you asking. If Cloudflare logs every ODoH query, the
query is still visible — it's simply unattributed.
- **The relay is a trusted party for availability.** A malicious relay
can drop or delay queries; it just can't read them.
- **Traffic analysis defeats small relays.** If you're the only client
talking to a relay, timing alone re-identifies you. Shared, busy relays
give better anonymity sets.
See the [ODoH integration doc][odoh] for more.
## Relay operator?
If you'd rather run your own relay (same binary, different mode), see
[`../relay/`](../relay/) — that package spins up a public-facing relay
with Caddy + ACME in front of it.

View File

@@ -0,0 +1,15 @@
services:
numa:
image: ghcr.io/razvandimescu/numa:latest
command: ["/etc/numa/numa.toml"]
ports:
- "53:53/udp"
- "53:53/tcp"
- "5380:5380/tcp" # dashboard + REST API
volumes:
- ./numa.toml:/etc/numa/numa.toml:ro
- numa_data:/var/lib/numa
restart: unless-stopped
volumes:
numa_data:

View File

@@ -0,0 +1,23 @@
# Numa — ODoH client mode (docker-compose starter).
# Sends every DNS query through an independent relay + target pair so
# neither operator sees both your IP and your question. See
# docs/implementation/odoh-integration.md for the protocol details and
# packaging/client/README.md for deploy notes.
[server]
bind_addr = "0.0.0.0:53"
api_bind_addr = "0.0.0.0"
data_dir = "/var/lib/numa"
[upstream]
mode = "odoh"
# Numa's own relay (Hetzner, systemd + Caddy). Swap to any other public
# ODoH relay if you'd rather not depend on a single operator; the protocol
# tolerates it, and Numa refuses same-operator relay+target by default.
relay = "https://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"
# strict = true (default). Relay failure → SERVFAIL, never silent downgrade.
[blocking]
enabled = true
# Default blocklist (Hagezi Pro). Edit the `lists` array to taste.

View File

@@ -1,14 +1,41 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# Dev server for site/: regenerates drafts on each MD change, reloads the
# browser on each rendered HTML/CSS/JS change. Port is the first numeric arg
# (default 9000); any other args are ignored for back-compat.
#
# First run downloads chokidar-cli + browser-sync into the npm cache — slow
# once, instant after that.
set -euo pipefail set -euo pipefail
PORT="${1:-9000}" PORT=9000
for arg in "$@"; do
if [[ "$arg" =~ ^[0-9]+$ ]]; then
PORT="$arg"
break
fi
done
if [[ "${1:-}" == "--drafts" ]] || [[ "${2:-}" == "--drafts" ]]; then command -v npx >/dev/null || { echo "npx not found. Install Node.js: https://nodejs.org" >&2; exit 1; }
PORT="${PORT//--drafts/9000}" # default port if --drafts was first arg command -v pandoc >/dev/null || { echo "pandoc not found (required by 'make blog-drafts')." >&2; exit 1; }
make blog-drafts
else
make blog
fi
echo "Serving site at http://localhost:$PORT" # Initial render so the first page load has everything.
cd site && python3 -m http.server "$PORT" make blog-drafts
echo "Serving site at http://localhost:$PORT (drafts included, live reload)"
# Kill child processes on exit so re-runs don't leave orphaned watchers.
trap 'kill $(jobs -p) 2>/dev/null' EXIT INT TERM
# Regenerate HTML when MD sources or the blog template change.
npx --yes chokidar-cli \
"drafts/*.md" "blog/*.md" "site/blog-template.html" \
-c "make blog-drafts" &
# Serve + reload on rendered-asset changes.
cd site && exec npx --yes browser-sync start \
--server . \
--port "$PORT" \
--files "**/*.html,**/*.css,**/*.js" \
--no-open \
--no-notify

View File

@@ -1244,7 +1244,7 @@ async function refresh() {
// QPS calculation // QPS calculation
const now = Date.now(); const now = Date.now();
const encPct = encryptionPct(stats.transport); const encPct = encryptionPct(stats.transport, ['dot', 'doh'], ['udp', 'tcp', 'dot', 'doh']);
if (prevTotal !== null && prevTime !== null) { if (prevTotal !== null && prevTime !== null) {
const dt = (now - prevTime) / 1000; const dt = (now - prevTime) / 1000;
const dq = q.total - prevTotal; const dq = q.total - prevTotal;
@@ -1273,6 +1273,7 @@ async function refresh() {
renderMemory(stats.memory, stats); renderMemory(stats.memory, stats);
} catch (err) { } catch (err) {
console.error('[numa dashboard] render failed:', err);
document.getElementById('statusDot').className = 'status-dot error'; document.getElementById('statusDot').className = 'status-dot error';
document.getElementById('statusText').textContent = 'disconnected'; document.getElementById('statusText').textContent = 'disconnected';
} }

View File

@@ -1,5 +1,5 @@
use std::collections::HashSet; use std::collections::HashSet;
use std::time::Instant; use std::time::{Duration, Instant};
use log::{info, warn}; use log::{info, warn};
@@ -355,27 +355,144 @@ mod tests {
} }
} }
pub async fn download_blocklists(lists: &[String]) -> Vec<(String, String)> { const RETRY_DELAYS_SECS: &[u64] = &[2, 10, 30];
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.gzip(true)
.build()
.unwrap_or_default();
let mut results = Vec::new(); pub async fn download_blocklists(
lists: &[String],
resolver: Option<std::sync::Arc<crate::bootstrap_resolver::NumaResolver>>,
) -> Vec<(String, String)> {
let mut builder = reqwest::Client::builder()
.timeout(Duration::from_secs(30))
.gzip(true);
if let Some(r) = resolver {
builder = builder.dns_resolver(r);
}
let client = builder.build().unwrap_or_default();
for url in lists { let fetches = lists.iter().map(|url| {
match client.get(url).send().await { let client = &client;
Ok(resp) => match resp.text().await { async move {
Ok(text) => { let text = fetch_with_retry(client, url).await?;
info!("downloaded blocklist: {} ({} bytes)", url, text.len()); info!("downloaded blocklist: {} ({} bytes)", url, text.len());
results.push((url.clone(), text)); Some((url.clone(), text))
} }
Err(e) => warn!("failed to read blocklist body {}: {}", url, e), });
}, futures::future::join_all(fetches)
Err(e) => warn!("failed to download blocklist {}: {}", url, e), .await
.into_iter()
.flatten()
.collect()
}
async fn fetch_with_retry(client: &reqwest::Client, url: &str) -> Option<String> {
fetch_with_retry_delays(client, url, RETRY_DELAYS_SECS).await
}
async fn fetch_with_retry_delays(
client: &reqwest::Client,
url: &str,
delays: &[u64],
) -> Option<String> {
let total = delays.len() + 1;
for attempt in 1..=total {
match fetch_once(client, url).await {
Ok(text) => return Some(text),
Err(msg) if attempt < total => {
let delay = delays[attempt - 1];
warn!(
"blocklist {} attempt {}/{} failed: {} — retrying in {}s",
url, attempt, total, msg, delay
);
tokio::time::sleep(Duration::from_secs(delay)).await;
} }
Err(msg) => {
warn!(
"blocklist {} attempt {}/{} failed: {} — giving up",
url, attempt, total, msg
);
}
}
}
None
}
async fn fetch_once(client: &reqwest::Client, url: &str) -> Result<String, String> {
let resp = client
.get(url)
.send()
.await
.map_err(|e| format_error_chain(&e))?;
resp.text().await.map_err(|e| format_error_chain(&e))
}
fn format_error_chain(e: &(dyn std::error::Error + 'static)) -> String {
let mut parts = vec![e.to_string()];
let mut src = e.source();
while let Some(s) = src {
parts.push(s.to_string());
src = s.source();
}
parts.join(": ")
}
#[cfg(test)]
mod retry_tests {
use super::*;
use std::net::SocketAddr;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpListener;
async fn flaky_http_server(drop_first_n: usize, body: &'static str) -> SocketAddr {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
tokio::spawn(async move {
for _ in 0..drop_first_n {
if let Ok((sock, _)) = listener.accept().await {
drop(sock);
}
}
loop {
let Ok((mut sock, _)) = listener.accept().await else {
return;
};
tokio::spawn(async move {
let mut buf = [0u8; 2048];
let _ = sock.read(&mut buf).await;
let response = format!(
"HTTP/1.1 200 OK\r\nContent-Length: {}\r\nContent-Type: text/plain\r\nConnection: close\r\n\r\n{}",
body.len(),
body,
);
let _ = sock.write_all(response.as_bytes()).await;
let _ = sock.shutdown().await;
});
}
});
addr
} }
results fn zero_delays() -> Vec<u64> {
vec![0; RETRY_DELAYS_SECS.len()]
}
#[tokio::test]
async fn retry_succeeds_on_final_attempt() {
let body = "ads.example.com\ntracker.example.net\n";
let delays = zero_delays();
let addr = flaky_http_server(delays.len(), body).await;
let client = reqwest::Client::new();
let url = format!("http://{addr}/");
let result = fetch_with_retry_delays(&client, &url, &delays).await;
assert_eq!(result.as_deref(), Some(body));
}
#[tokio::test]
async fn retry_gives_up_when_all_attempts_fail() {
let delays = zero_delays();
let addr = flaky_http_server(delays.len() + 2, "unreachable").await;
let client = reqwest::Client::new();
let url = format!("http://{addr}/");
let result = fetch_with_retry_delays(&client, &url, &delays).await;
assert_eq!(result, None);
}
} }

235
src/bootstrap_resolver.rs Normal file
View File

@@ -0,0 +1,235 @@
//! `reqwest` DNS resolver used by numa-originated HTTPS (DoH upstream, ODoH
//! relay/target, blocklist CDN). When numa is its own system resolver
//! (`/etc/resolv.conf → 127.0.0.1`, HAOS add-on, Pi-hole-style container),
//! the default `getaddrinfo` path loops back through numa before numa can
//! answer — a chicken-and-egg that deadlocks cold boot. See issue #122 and
//! `docs/implementation/bootstrap-resolver.md`.
//!
//! Resolution order per hostname:
//! 1. Per-hostname overrides (e.g. ODoH `relay_ip` / `target_ip`) → return
//! immediately, no DNS query. Preserves ODoH's "zero plain-DNS leak"
//! property for configured endpoints.
//! 2. Otherwise, query A + AAAA in parallel via UDP to IP-literal bootstrap
//! servers, with TCP fallback on UDP timeout (for networks that block
//! outbound UDP:53 — see memory: `project_network_udp_hostile.md`).
use std::collections::BTreeMap;
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
use std::time::Duration;
use log::{debug, info, warn};
use reqwest::dns::{Addrs, Name, Resolve, Resolving};
use crate::forward::{forward_tcp, forward_udp};
use crate::packet::DnsPacket;
use crate::question::QueryType;
use crate::record::DnsRecord;
const UDP_TIMEOUT: Duration = Duration::from_millis(800);
const TCP_TIMEOUT: Duration = Duration::from_millis(1500);
const DEFAULT_BOOTSTRAP: &[SocketAddr] = &[
SocketAddr::new(IpAddr::V4(Ipv4Addr::new(9, 9, 9, 9)), 53),
SocketAddr::new(IpAddr::V4(Ipv4Addr::new(1, 1, 1, 1)), 53),
];
pub struct NumaResolver {
bootstrap: Vec<SocketAddr>,
overrides: BTreeMap<String, Vec<IpAddr>>,
}
impl NumaResolver {
/// Build a resolver from the configured `upstream.fallback` list and any
/// per-hostname overrides (e.g. ODoH's `relay_ip`/`target_ip`).
///
/// `fallback` entries are filtered to IP literals only — hostnames would
/// re-introduce the self-loop inside the resolver itself. Empty or
/// unusable fallback yields the hardcoded default (Quad9 + Cloudflare).
pub fn new(fallback: &[String], overrides: BTreeMap<String, Vec<IpAddr>>) -> Self {
let mut bootstrap: Vec<SocketAddr> = Vec::with_capacity(fallback.len());
for entry in fallback {
match crate::forward::parse_upstream_addr(entry, 53) {
Ok(addr) => bootstrap.push(addr),
Err(_) => {
warn!(
"bootstrap_resolver: skipping non-IP fallback '{}' \
(hostnames would re-enter the self-loop)",
entry
);
}
}
}
let source = if bootstrap.is_empty() {
bootstrap = DEFAULT_BOOTSTRAP.to_vec();
"default (no IP-literal in upstream.fallback)"
} else {
"upstream.fallback"
};
let ips: Vec<String> = bootstrap.iter().map(|s| s.ip().to_string()).collect();
info!(
"bootstrap resolver: {} via {} — used for numa-originated HTTPS hostname resolution",
ips.join(", "),
source
);
if !overrides.is_empty() {
let pairs: Vec<String> = overrides
.iter()
.flat_map(|(host, addrs)| addrs.iter().map(move |ip| format!("{}={}", host, ip)))
.collect();
info!(
"bootstrap resolver: host overrides (skip DNS, connect direct): {}",
pairs.join(", ")
);
}
Self {
bootstrap,
overrides,
}
}
#[cfg(test)]
pub fn bootstrap(&self) -> &[SocketAddr] {
&self.bootstrap
}
}
impl Resolve for NumaResolver {
fn resolve(&self, name: Name) -> Resolving {
let hostname = name.as_str().to_string();
if let Some(ips) = self.overrides.get(&hostname) {
let addrs: Vec<SocketAddr> = ips.iter().map(|ip| SocketAddr::new(*ip, 0)).collect();
debug!(
"bootstrap_resolver: override hit for {} → {:?}",
hostname, ips
);
return Box::pin(async move { Ok(Box::new(addrs.into_iter()) as Addrs) });
}
let bootstrap = self.bootstrap.clone();
Box::pin(async move {
let addrs = resolve_via_bootstrap(&hostname, &bootstrap).await?;
debug!(
"bootstrap_resolver: resolved {} → {} addr(s)",
hostname,
addrs.len()
);
Ok(Box::new(addrs.into_iter()) as Addrs)
})
}
}
async fn resolve_via_bootstrap(
hostname: &str,
bootstrap: &[SocketAddr],
) -> Result<Vec<SocketAddr>, Box<dyn std::error::Error + Send + Sync>> {
let mut last_err: Option<String> = None;
for &server in bootstrap {
let q_a = DnsPacket::query(0xBEEF, hostname, QueryType::A);
let q_aaaa = DnsPacket::query(0xBEF0, hostname, QueryType::AAAA);
let (a_res, aaaa_res) = tokio::join!(
query_with_tcp_fallback(&q_a, server),
query_with_tcp_fallback(&q_aaaa, server),
);
let mut out = Vec::new();
match a_res {
Ok(pkt) => extract_addrs(&pkt, &mut out),
Err(e) => last_err = Some(format!("{} A failed: {}", server, e)),
}
match aaaa_res {
Ok(pkt) => extract_addrs(&pkt, &mut out),
// AAAA is optional — many hosts return NXDOMAIN/empty. Don't
// treat as the primary error if A succeeded.
Err(e) => debug!("bootstrap {} AAAA for {} failed: {}", server, hostname, e),
}
if !out.is_empty() {
return Ok(out);
}
}
Err(last_err
.unwrap_or_else(|| "no bootstrap servers reachable".into())
.into())
}
async fn query_with_tcp_fallback(
query: &DnsPacket,
server: SocketAddr,
) -> crate::Result<DnsPacket> {
match forward_udp(query, server, UDP_TIMEOUT).await {
Ok(pkt) => Ok(pkt),
Err(e) => {
debug!(
"bootstrap UDP {} failed ({}), falling back to TCP",
server, e
);
forward_tcp(query, server, TCP_TIMEOUT).await
}
}
}
fn extract_addrs(pkt: &DnsPacket, out: &mut Vec<SocketAddr>) {
for r in &pkt.answers {
match r {
DnsRecord::A { addr, .. } => out.push(SocketAddr::new(IpAddr::V4(*addr), 0)),
DnsRecord::AAAA { addr, .. } => out.push(SocketAddr::new(IpAddr::V6(*addr), 0)),
_ => {}
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::net::{Ipv4Addr, Ipv6Addr};
#[test]
fn empty_fallback_uses_defaults() {
let r = NumaResolver::new(&[], BTreeMap::new());
let got: Vec<String> = r.bootstrap().iter().map(|s| s.to_string()).collect();
assert_eq!(got, vec!["9.9.9.9:53", "1.1.1.1:53"]);
}
#[test]
fn fallback_accepts_ip_literals_only() {
let fallback = vec![
"9.9.9.9".to_string(),
"dns.quad9.net".to_string(),
"1.1.1.1:5353".to_string(),
];
let r = NumaResolver::new(&fallback, BTreeMap::new());
let got: Vec<String> = r.bootstrap().iter().map(|s| s.to_string()).collect();
assert_eq!(got, vec!["9.9.9.9:53", "1.1.1.1:5353"]);
}
#[test]
fn override_returns_configured_ips_without_dns() {
let mut overrides = BTreeMap::new();
overrides.insert(
"odoh-relay.example".to_string(),
vec![IpAddr::V4(Ipv4Addr::new(178, 104, 229, 30))],
);
let r = NumaResolver::new(&[], overrides);
let name: Name = "odoh-relay.example".parse().unwrap();
let fut = r.resolve(name);
let res = futures::executor::block_on(fut).unwrap();
let addrs: Vec<_> = res.collect();
assert_eq!(addrs.len(), 1);
assert_eq!(addrs[0].ip(), IpAddr::V4(Ipv4Addr::new(178, 104, 229, 30)));
}
#[test]
fn override_supports_multiple_ips_including_ipv6() {
let mut overrides = BTreeMap::new();
overrides.insert(
"dual.example".to_string(),
vec![
IpAddr::V4(Ipv4Addr::new(1, 2, 3, 4)),
IpAddr::V6(Ipv6Addr::LOCALHOST),
],
);
let r = NumaResolver::new(&[], overrides);
let res = futures::executor::block_on(r.resolve("dual.example".parse().unwrap())).unwrap();
let addrs: Vec<_> = res.collect();
assert_eq!(addrs.len(), 2);
}
}

View File

@@ -56,7 +56,7 @@ impl ForwardingRuleConfig {
} }
let mut primary = Vec::with_capacity(self.upstream.len()); let mut primary = Vec::with_capacity(self.upstream.len());
for s in &self.upstream { for s in &self.upstream {
let u = crate::forward::parse_upstream(s, 53) let u = crate::forward::parse_upstream(s, 53, None)
.map_err(|e| format!("forwarding rule for upstream '{}': {}", s, e))?; .map_err(|e| format!("forwarding rule for upstream '{}': {}", s, e))?;
primary.push(u); primary.push(u);
} }
@@ -241,6 +241,26 @@ pub struct OdohUpstream {
pub target_bootstrap: Option<SocketAddr>, pub target_bootstrap: Option<SocketAddr>,
} }
impl OdohUpstream {
/// Per-host IP overrides for the bootstrap resolver, lifted from
/// `relay_ip`/`target_ip`. Keeps the "zero plain-DNS leak for ODoH
/// endpoints" property when numa is its own system resolver.
pub fn host_ip_overrides(&self) -> std::collections::BTreeMap<String, Vec<std::net::IpAddr>> {
let mut out = std::collections::BTreeMap::new();
if let Some(addr) = self.relay_bootstrap {
out.entry(self.relay_host.clone())
.or_insert_with(Vec::new)
.push(addr.ip());
}
if let Some(addr) = self.target_bootstrap {
out.entry(self.target_host.clone())
.or_insert_with(Vec::new)
.push(addr.ip());
}
out
}
}
impl UpstreamConfig { impl UpstreamConfig {
/// Validate and extract ODoH-specific fields. Called during `load_config` /// Validate and extract ODoH-specific fields. Called during `load_config`
/// so misconfigured ODoH fails fast at startup, the same care we take /// so misconfigured ODoH fails fast at startup, the same care we take
@@ -263,25 +283,29 @@ impl UpstreamConfig {
if relay_url.scheme() != "https" || target_url.scheme() != "https" { if relay_url.scheme() != "https" || target_url.scheme() != "https" {
return Err("upstream.relay and upstream.target must both use https://".into()); return Err("upstream.relay and upstream.target must both use https://".into());
} }
if relay_url.host_str().is_none() || target_url.host_str().is_none() {
return Err("upstream.relay and upstream.target must include a host".into());
}
if relay_url.host_str() == target_url.host_str() {
return Err(format!(
"upstream.relay and upstream.target resolve to the same host ({}); the privacy property requires distinct operators",
relay_url.host_str().unwrap_or("?")
)
.into());
}
let relay_host = relay_url let relay_host = relay_url
.host_str() .host_str()
.ok_or("upstream.relay has no host")? .ok_or("upstream.relay must include a host")?
.to_string(); .to_string();
let target_host = target_url let target_host = target_url
.host_str() .host_str()
.ok_or("upstream.target has no host")? .ok_or("upstream.target must include a host")?
.to_string(); .to_string();
if relay_host == target_host {
return Err(format!(
"upstream.relay and upstream.target resolve to the same host ({}); the privacy property requires distinct operators",
relay_host
)
.into());
}
if let Some(shared) = shared_registrable_domain(&relay_host, &target_host) {
return Err(format!(
"upstream.relay ({}) and upstream.target ({}) share the registrable domain ({}); the privacy property requires distinct operators",
relay_host, target_host, shared
)
.into());
}
let target_path = if target_url.path().is_empty() { let target_path = if target_url.path().is_empty() {
"/".to_string() "/".to_string()
} else { } else {
@@ -303,6 +327,20 @@ impl UpstreamConfig {
} }
} }
/// Returns the registrable domain (eTLD+1) shared by both hosts, if any.
/// Fails open on hosts the PSL can't parse (IP literals, bare TLDs).
fn shared_registrable_domain(relay_host: &str, target_host: &str) -> Option<String> {
let relay = psl::domain(relay_host.as_bytes())?;
let target = psl::domain(target_host.as_bytes())?;
if relay.as_bytes() == target.as_bytes() {
std::str::from_utf8(relay.as_bytes())
.ok()
.map(str::to_owned)
} else {
None
}
}
fn string_or_vec<'de, D>(deserializer: D) -> std::result::Result<Vec<String>, D::Error> fn string_or_vec<'de, D>(deserializer: D) -> std::result::Result<Vec<String>, D::Error>
where where
D: serde::Deserializer<'de>, D: serde::Deserializer<'de>,
@@ -413,8 +451,12 @@ fn default_upstream_port() -> u16 {
fn default_timeout_ms() -> u64 { fn default_timeout_ms() -> u64 {
5000 5000
} }
/// Off by default: hedging fires a second upstream query, which silently
/// doubles the count at the provider — hurts quota'd DNS (NextDNS, Control
/// D). Opt in with `hedge_ms = 10` for tail-latency rescue on flaky nets
/// or handshake-slow DoT.
fn default_hedge_ms() -> u64 { fn default_hedge_ms() -> u64 {
10 0
} }
#[derive(Deserialize)] #[derive(Deserialize)]
@@ -830,6 +872,59 @@ target = "https://odoh.example.com/dns-query"
assert!(err.contains("same host"), "got: {err}"); assert!(err.contains("same host"), "got: {err}");
} }
#[test]
fn odoh_rejects_shared_registrable_domain() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://r.cloudflare.com/relay"
target = "https://odoh.cloudflare.com/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("registrable domain"), "got: {err}");
assert!(err.contains("cloudflare.com"), "got: {err}");
}
#[test]
fn odoh_rejects_shared_registrable_under_multi_label_suffix() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://a.foo.co.uk/relay"
target = "https://b.foo.co.uk/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
let err = config.upstream.odoh_upstream().unwrap_err().to_string();
assert!(err.contains("foo.co.uk"), "got: {err}");
}
#[test]
fn odoh_accepts_distinct_registrable_under_multi_label_suffix() {
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://relay.foo.co.uk/relay"
target = "https://target.bar.co.uk/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert!(config.upstream.odoh_upstream().is_ok());
}
#[test]
fn odoh_accepts_distinct_private_psl_suffix_subdomains() {
// *.github.io is a public suffix, so foo.github.io and bar.github.io
// are independent registrable domains — accept.
let toml = r#"
[upstream]
mode = "odoh"
relay = "https://foo.github.io/relay"
target = "https://bar.github.io/dns-query"
"#;
let config: Config = toml::from_str(toml).unwrap();
assert!(config.upstream.odoh_upstream().is_ok());
}
#[test] #[test]
fn odoh_rejects_non_https() { fn odoh_rejects_non_https() {
let toml = r#" let toml = r#"

View File

@@ -209,45 +209,32 @@ pub async fn resolve_query(
{ {
// Conditional forwarding takes priority over recursive mode // Conditional forwarding takes priority over recursive mode
// (e.g. Tailscale .ts.net, VPC private zones) // (e.g. Tailscale .ts.net, VPC private zones)
upstream_transport = pool.preferred().map(|u| u.transport()); let key = (qname.clone(), qtype);
match forward_with_failover_raw( let (resp, path, err) =
resolve_coalesced(&ctx.inflight, key, &query, QueryPath::Forwarded, || async {
let wire = forward_with_failover_raw(
raw_wire, raw_wire,
pool, pool,
&ctx.srtt, &ctx.srtt,
ctx.timeout, ctx.timeout,
ctx.hedge_delay, ctx.hedge_delay,
) )
.await .await?;
{ cache_and_parse(ctx, &qname, qtype, &wire)
Ok(resp_wire) => match cache_and_parse(ctx, &qname, qtype, &resp_wire) { })
Ok(resp) => (resp, QueryPath::Forwarded, DnssecStatus::Indeterminate), .await;
Err(e) => { log_coalesced_outcome(src_addr, qtype, &qname, path, err.as_deref(), "FORWARD");
error!("{} | {:?} {} | PARSE ERROR | {}", src_addr, qtype, qname, e); if path == QueryPath::Forwarded {
( upstream_transport = pool.preferred().map(|u| u.transport());
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
},
Err(e) => {
error!(
"{} | {:?} {} | FORWARD ERROR | {}",
src_addr, qtype, qname, e
);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
} }
(resp, path, DnssecStatus::Indeterminate)
} else if ctx.upstream_mode == UpstreamMode::Recursive { } else if ctx.upstream_mode == UpstreamMode::Recursive {
// Recursive resolution makes UDP hops to roots/TLDs/auths; // Recursive resolution makes UDP hops to roots/TLDs/auths;
// tag as Udp so the dashboard can aggregate plaintext-wire // tag as Udp so the dashboard can aggregate plaintext-wire
// egress honestly. Only mark on success — errors stay None. // egress honestly. Only mark on success — errors stay None.
let key = (qname.clone(), qtype); let key = (qname.clone(), qtype);
let (resp, path, err) = resolve_coalesced(&ctx.inflight, key, &query, || { let (resp, path, err) =
resolve_coalesced(&ctx.inflight, key, &query, QueryPath::Recursive, || {
crate::recursive::resolve_recursive( crate::recursive::resolve_recursive(
&qname, &qname,
qtype, qtype,
@@ -258,57 +245,32 @@ pub async fn resolve_query(
) )
}) })
.await; .await;
if path == QueryPath::Coalesced { log_coalesced_outcome(src_addr, qtype, &qname, path, err.as_deref(), "RECURSIVE");
debug!("{} | {:?} {} | COALESCED", src_addr, qtype, qname); if path == QueryPath::Recursive {
} else if path == QueryPath::UpstreamError {
error!(
"{} | {:?} {} | RECURSIVE ERROR | {}",
src_addr,
qtype,
qname,
err.as_deref().unwrap_or("leader failed")
);
} else {
upstream_transport = Some(crate::stats::UpstreamTransport::Udp); upstream_transport = Some(crate::stats::UpstreamTransport::Udp);
} }
(resp, path, DnssecStatus::Indeterminate) (resp, path, DnssecStatus::Indeterminate)
} else { } else {
let pool = ctx.upstream_pool.lock().unwrap().clone(); let pool = ctx.upstream_pool.lock().unwrap().clone();
match forward_with_failover_raw( let key = (qname.clone(), qtype);
let (resp, path, err) =
resolve_coalesced(&ctx.inflight, key, &query, QueryPath::Upstream, || async {
let wire = forward_with_failover_raw(
raw_wire, raw_wire,
&pool, &pool,
&ctx.srtt, &ctx.srtt,
ctx.timeout, ctx.timeout,
ctx.hedge_delay, ctx.hedge_delay,
) )
.await .await?;
{ cache_and_parse(ctx, &qname, qtype, &wire)
Ok(resp_wire) => match cache_and_parse(ctx, &qname, qtype, &resp_wire) { })
Ok(resp) => { .await;
log_coalesced_outcome(src_addr, qtype, &qname, path, err.as_deref(), "UPSTREAM");
if path == QueryPath::Upstream {
upstream_transport = pool.preferred().map(|u| u.transport()); upstream_transport = pool.preferred().map(|u| u.transport());
(resp, QueryPath::Upstream, DnssecStatus::Indeterminate)
}
Err(e) => {
error!("{} | {:?} {} | PARSE ERROR | {}", src_addr, qtype, qname, e);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
},
Err(e) => {
error!(
"{} | {:?} {} | UPSTREAM ERROR | {}",
src_addr, qtype, qname, e
);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
DnssecStatus::Indeterminate,
)
}
} }
(resp, path, DnssecStatus::Indeterminate)
} }
} }
}; };
@@ -611,11 +573,15 @@ fn acquire_inflight(inflight: &Mutex<InflightMap>, key: (String, QueryType)) ->
/// Run a resolve function with in-flight coalescing. Multiple concurrent calls /// Run a resolve function with in-flight coalescing. Multiple concurrent calls
/// for the same key share a single resolution — the first caller (leader) /// for the same key share a single resolution — the first caller (leader)
/// executes `resolve_fn`, and followers wait for the broadcast result. /// executes `resolve_fn`, and followers wait for the broadcast result. The
/// leader's successful path is tagged with `leader_path` so callers that
/// share this helper (recursive, forwarded-rule, forward-upstream) keep their
/// own observability without duplicating the inflight map.
async fn resolve_coalesced<F, Fut>( async fn resolve_coalesced<F, Fut>(
inflight: &Mutex<InflightMap>, inflight: &Mutex<InflightMap>,
key: (String, QueryType), key: (String, QueryType),
query: &DnsPacket, query: &DnsPacket,
leader_path: QueryPath,
resolve_fn: F, resolve_fn: F,
) -> (DnsPacket, QueryPath, Option<String>) ) -> (DnsPacket, QueryPath, Option<String>)
where where
@@ -644,7 +610,7 @@ where
match result { match result {
Ok(resp) => { Ok(resp) => {
let _ = tx.send(Some(resp.clone())); let _ = tx.send(Some(resp.clone()));
(resp, QueryPath::Recursive, None) (resp, leader_path, None)
} }
Err(e) => { Err(e) => {
let _ = tx.send(None); let _ = tx.send(None);
@@ -671,6 +637,33 @@ impl Drop for InflightGuard<'_> {
} }
} }
/// Emit the log lines shared by the three upstream branches (Forwarded,
/// Recursive, Upstream) after `resolve_coalesced` returns. Leader-success
/// and transport-tagging stay at the call site since they diverge per
/// branch, but the Coalesced debug and UpstreamError error are identical
/// except for the label.
fn log_coalesced_outcome(
src_addr: SocketAddr,
qtype: QueryType,
qname: &str,
path: QueryPath,
err: Option<&str>,
label: &str,
) {
match path {
QueryPath::Coalesced => debug!("{} | {:?} {} | COALESCED", src_addr, qtype, qname),
QueryPath::UpstreamError => error!(
"{} | {:?} {} | {} ERROR | {}",
src_addr,
qtype,
qname,
label,
err.unwrap_or("leader failed")
),
_ => {}
}
}
fn special_use_response(query: &DnsPacket, qname: &str, qtype: QueryType) -> DnsPacket { fn special_use_response(query: &DnsPacket, qname: &str, qtype: QueryType) -> DnsPacket {
use std::net::{Ipv4Addr, Ipv6Addr}; use std::net::{Ipv4Addr, Ipv6Addr};
if qname == "ipv4only.arpa" { if qname == "ipv4only.arpa" {
@@ -909,7 +902,7 @@ mod tests {
let key = ("coalesce.test".to_string(), QueryType::A); let key = ("coalesce.test".to_string(), QueryType::A);
let query = DnsPacket::query(100 + i, "coalesce.test", QueryType::A); let query = DnsPacket::query(100 + i, "coalesce.test", QueryType::A);
handles.push(tokio::spawn(async move { handles.push(tokio::spawn(async move {
resolve_coalesced(&inf, key, &query, || async { resolve_coalesced(&inf, key, &query, QueryPath::Recursive, || async {
count.fetch_add(1, std::sync::atomic::Ordering::Relaxed); count.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
tokio::time::sleep(Duration::from_millis(200)).await; tokio::time::sleep(Duration::from_millis(200)).await;
Ok(mock_response("coalesce.test")) Ok(mock_response("coalesce.test"))
@@ -953,6 +946,7 @@ mod tests {
&inf1, &inf1,
("same.domain".to_string(), QueryType::A), ("same.domain".to_string(), QueryType::A),
&query_a, &query_a,
QueryPath::Recursive,
|| async { || async {
count1.fetch_add(1, std::sync::atomic::Ordering::Relaxed); count1.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
tokio::time::sleep(Duration::from_millis(100)).await; tokio::time::sleep(Duration::from_millis(100)).await;
@@ -966,6 +960,7 @@ mod tests {
&inf2, &inf2,
("same.domain".to_string(), QueryType::AAAA), ("same.domain".to_string(), QueryType::AAAA),
&query_aaaa, &query_aaaa,
QueryPath::Recursive,
|| async { || async {
count2.fetch_add(1, std::sync::atomic::Ordering::Relaxed); count2.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
tokio::time::sleep(Duration::from_millis(100)).await; tokio::time::sleep(Duration::from_millis(100)).await;
@@ -995,6 +990,7 @@ mod tests {
&inflight, &inflight,
("will-fail.test".to_string(), QueryType::A), ("will-fail.test".to_string(), QueryType::A),
&query, &query,
QueryPath::Recursive,
|| async { Err::<DnsPacket, _>("upstream timeout".into()) }, || async { Err::<DnsPacket, _>("upstream timeout".into()) },
) )
.await; .await;
@@ -1016,6 +1012,7 @@ mod tests {
&inf, &inf,
("fail.test".to_string(), QueryType::A), ("fail.test".to_string(), QueryType::A),
&query, &query,
QueryPath::Recursive,
|| async { || async {
tokio::time::sleep(Duration::from_millis(200)).await; tokio::time::sleep(Duration::from_millis(200)).await;
Err::<DnsPacket, _>("upstream error".into()) Err::<DnsPacket, _>("upstream error".into())
@@ -1056,6 +1053,7 @@ mod tests {
&inflight, &inflight,
("question.test".to_string(), QueryType::A), ("question.test".to_string(), QueryType::A),
&query, &query,
QueryPath::Recursive,
|| async { Err::<DnsPacket, _>("fail".into()) }, || async { Err::<DnsPacket, _>("fail".into()) },
) )
.await; .await;
@@ -1080,6 +1078,7 @@ mod tests {
&inflight, &inflight,
("err-msg.test".to_string(), QueryType::A), ("err-msg.test".to_string(), QueryType::A),
&query, &query,
QueryPath::Recursive,
|| async { Err::<DnsPacket, _>("connection refused by upstream".into()) }, || async { Err::<DnsPacket, _>("connection refused by upstream".into()) },
) )
.await; .await;

View File

@@ -113,10 +113,7 @@ impl fmt::Display for Upstream {
} }
} }
pub(crate) fn parse_upstream_addr( pub fn parse_upstream_addr(s: &str, default_port: u16) -> std::result::Result<SocketAddr, String> {
s: &str,
default_port: u16,
) -> std::result::Result<SocketAddr, String> {
// Try full socket addr first: "1.2.3.4:5353" or "[::1]:5353" // Try full socket addr first: "1.2.3.4:5353" or "[::1]:5353"
if let Ok(addr) = s.parse::<SocketAddr>() { if let Ok(addr) = s.parse::<SocketAddr>() {
return Ok(addr); return Ok(addr);
@@ -129,19 +126,28 @@ pub(crate) fn parse_upstream_addr(
} }
/// Parse a slice of upstream address strings into `Upstream` values, failing /// Parse a slice of upstream address strings into `Upstream` values, failing
/// on the first invalid entry. /// on the first invalid entry. DoH entries use `resolver` (when provided) as
pub fn parse_upstream_list(addrs: &[String], default_port: u16) -> Result<Vec<Upstream>> { /// their hostname resolver.
pub fn parse_upstream_list(
addrs: &[String],
default_port: u16,
resolver: Option<Arc<crate::bootstrap_resolver::NumaResolver>>,
) -> Result<Vec<Upstream>> {
addrs addrs
.iter() .iter()
.map(|s| parse_upstream(s, default_port)) .map(|s| parse_upstream(s, default_port, resolver.clone()))
.collect() .collect()
} }
pub fn parse_upstream(s: &str, default_port: u16) -> Result<Upstream> { pub fn parse_upstream(
s: &str,
default_port: u16,
resolver: Option<Arc<crate::bootstrap_resolver::NumaResolver>>,
) -> Result<Upstream> {
if s.starts_with("https://") { if s.starts_with("https://") {
return Ok(Upstream::Doh { return Ok(Upstream::Doh {
url: s.to_string(), url: s.to_string(),
client: build_https_client(), client: build_https_client_with_resolver(1, resolver),
}); });
} }
// tls://IP:PORT#hostname or tls://IP#hostname (default port 853) // tls://IP:PORT#hostname or tls://IP#hostname (default port 853)
@@ -163,12 +169,16 @@ pub fn parse_upstream(s: &str, default_port: u16) -> Result<Upstream> {
} }
/// HTTP/2 client tuned for DoH/ODoH: small windows for low latency, long-lived /// HTTP/2 client tuned for DoH/ODoH: small windows for low latency, long-lived
/// keep-alive. Shared by the DoH upstream and the ODoH config-fetcher + /// keep-alive. Pool defaults to one idle conn per host — good for resolvers
/// seal/open path. Pool defaults to one idle conn per host — good for /// that talk to a single upstream; relays that fan out to many targets
/// resolvers that talk to a single upstream; relays that fan out to many /// should use [`build_https_client_with_pool`].
/// targets should use [`build_https_client_with_pool`]. ///
/// Uses the system resolver. Callers running inside `serve::run` pass the
/// shared [`crate::bootstrap_resolver::NumaResolver`] via
/// [`build_https_client_with_resolver`] to avoid the self-loop documented
/// in `docs/implementation/bootstrap-resolver.md`.
pub fn build_https_client() -> reqwest::Client { pub fn build_https_client() -> reqwest::Client {
build_https_client_with_pool(1) build_https_client_with_resolver(1, None)
} }
/// Same shape as [`build_https_client`], but caller picks /// Same shape as [`build_https_client`], but caller picks
@@ -176,20 +186,18 @@ pub fn build_https_client() -> reqwest::Client {
/// and benefit from a larger pool so warm connections survive concurrent /// and benefit from a larger pool so warm connections survive concurrent
/// fan-out. /// fan-out.
pub fn build_https_client_with_pool(pool_max_idle_per_host: usize) -> reqwest::Client { pub fn build_https_client_with_pool(pool_max_idle_per_host: usize) -> reqwest::Client {
https_client_builder(pool_max_idle_per_host) build_https_client_with_resolver(pool_max_idle_per_host, None)
.build()
.unwrap_or_default()
} }
/// HTTPS client for the ODoH upstream, with bootstrap-IP overrides applied /// [`build_https_client`] with an optional custom DNS resolver. Numa wires
/// so relay/target hostname resolution can bypass system DNS. /// [`crate::bootstrap_resolver::NumaResolver`] here.
pub fn build_odoh_client(odoh: &crate::config::OdohUpstream) -> reqwest::Client { pub fn build_https_client_with_resolver(
let mut builder = https_client_builder(1); pool_max_idle_per_host: usize,
if let Some(addr) = odoh.relay_bootstrap { resolver: Option<Arc<crate::bootstrap_resolver::NumaResolver>>,
builder = builder.resolve(&odoh.relay_host, addr); ) -> reqwest::Client {
} let mut builder = https_client_builder(pool_max_idle_per_host);
if let Some(addr) = odoh.target_bootstrap { if let Some(r) = resolver {
builder = builder.resolve(&odoh.target_host, addr); builder = builder.dns_resolver(r);
} }
builder.build().unwrap_or_default() builder.build().unwrap_or_default()
} }
@@ -553,6 +561,9 @@ async fn forward_doh_raw(
/// Send a lightweight keepalive query to a DoH upstream to prevent /// Send a lightweight keepalive query to a DoH upstream to prevent
/// the HTTP/2 + TLS connection from going idle and being torn down. /// the HTTP/2 + TLS connection from going idle and being torn down.
/// The first call doubles as a startup warm-up: bootstrap-resolver failures
/// (unreachable Quad9/Cloudflare defaults, misconfigured hostname upstream)
/// surface here rather than on the first client query.
pub async fn keepalive_doh(upstream: &Upstream) { pub async fn keepalive_doh(upstream: &Upstream) {
if let Upstream::Doh { url, client } = upstream { if let Upstream::Doh { url, client } = upstream {
// Query for . NS — minimal, always succeeds, response is small // Query for . NS — minimal, always succeeds, response is small
@@ -565,7 +576,9 @@ pub async fn keepalive_doh(upstream: &Upstream) {
0x00, 0x02, // type NS 0x00, 0x02, // type NS
0x00, 0x01, // class IN 0x00, 0x01, // class IN
]; ];
let _ = forward_doh_raw(wire, url, client, Duration::from_secs(5)).await; if let Err(e) = forward_doh_raw(wire, url, client, Duration::from_secs(5)).await {
log::warn!("DoH keepalive to {} failed: {}", url, e);
}
} }
} }

View File

@@ -1,5 +1,6 @@
pub mod api; pub mod api;
pub mod blocklist; pub mod blocklist;
pub mod bootstrap_resolver;
pub mod buffer; pub mod buffer;
pub mod cache; pub mod cache;
pub mod config; pub mod config;

View File

@@ -13,12 +13,13 @@ use log::{error, info};
use tokio::net::UdpSocket; use tokio::net::UdpSocket;
use crate::blocklist::{download_blocklists, parse_blocklist, BlocklistStore}; use crate::blocklist::{download_blocklists, parse_blocklist, BlocklistStore};
use crate::bootstrap_resolver::NumaResolver;
use crate::buffer::BytePacketBuffer; use crate::buffer::BytePacketBuffer;
use crate::cache::DnsCache; use crate::cache::DnsCache;
use crate::config::{build_zone_map, load_config, ConfigLoad}; use crate::config::{build_zone_map, load_config, ConfigLoad};
use crate::ctx::{handle_query, ServerCtx}; use crate::ctx::{handle_query, ServerCtx};
use crate::forward::{ use crate::forward::{
build_https_client, build_odoh_client, parse_upstream_list, Upstream, UpstreamPool, build_https_client_with_resolver, parse_upstream_list, Upstream, UpstreamPool,
}; };
use crate::odoh::OdohConfigCache; use crate::odoh::OdohConfigCache;
use crate::override_store::OverrideStore; use crate::override_store::OverrideStore;
@@ -48,6 +49,23 @@ pub async fn run(config_path: String) -> crate::Result<()> {
(dummy, "recursive (root hints)".to_string()) (dummy, "recursive (root hints)".to_string())
}; };
// Routes numa-originated HTTPS (DoH upstream, ODoH relay/target, blocklist
// CDN) away from the system resolver so lookups don't loop back through
// numa when it's its own system DNS.
// See `docs/implementation/bootstrap-resolver.md`.
let resolver_overrides = match config.upstream.mode {
crate::config::UpstreamMode::Odoh => config
.upstream
.odoh_upstream()
.map(|o| o.host_ip_overrides())
.unwrap_or_default(),
_ => std::collections::BTreeMap::new(),
};
let bootstrap_resolver: Arc<NumaResolver> = Arc::new(NumaResolver::new(
&config.upstream.fallback,
resolver_overrides,
));
let (resolved_mode, upstream_auto, pool, upstream_label) = match config.upstream.mode { let (resolved_mode, upstream_auto, pool, upstream_label) = match config.upstream.mode {
crate::config::UpstreamMode::Auto => { crate::config::UpstreamMode::Auto => {
info!("auto mode: probing recursive resolution..."); info!("auto mode: probing recursive resolution...");
@@ -57,7 +75,7 @@ pub async fn run(config_path: String) -> crate::Result<()> {
(crate::config::UpstreamMode::Recursive, false, pool, label) (crate::config::UpstreamMode::Recursive, false, pool, label)
} else { } else {
log::warn!("recursive probe failed — falling back to Quad9 DoH"); log::warn!("recursive probe failed — falling back to Quad9 DoH");
let client = build_https_client(); let client = build_https_client_with_resolver(1, Some(bootstrap_resolver.clone()));
let url = DOH_FALLBACK.to_string(); let url = DOH_FALLBACK.to_string();
let label = url.clone(); let label = url.clone();
let pool = UpstreamPool::new(vec![Upstream::Doh { url, client }], vec![]); let pool = UpstreamPool::new(vec![Upstream::Doh { url, client }], vec![]);
@@ -82,8 +100,16 @@ pub async fn run(config_path: String) -> crate::Result<()> {
config.upstream.address.clone() config.upstream.address.clone()
}; };
let primary = parse_upstream_list(&addrs, config.upstream.port)?; let primary = parse_upstream_list(
let fallback = parse_upstream_list(&config.upstream.fallback, config.upstream.port)?; &addrs,
config.upstream.port,
Some(bootstrap_resolver.clone()),
)?;
let fallback = parse_upstream_list(
&config.upstream.fallback,
config.upstream.port,
Some(bootstrap_resolver.clone()),
)?;
let pool = UpstreamPool::new(primary, fallback); let pool = UpstreamPool::new(primary, fallback);
let label = pool.label(); let label = pool.label();
@@ -96,7 +122,7 @@ pub async fn run(config_path: String) -> crate::Result<()> {
} }
crate::config::UpstreamMode::Odoh => { crate::config::UpstreamMode::Odoh => {
let odoh = config.upstream.odoh_upstream()?; let odoh = config.upstream.odoh_upstream()?;
let client = build_odoh_client(&odoh); let client = build_https_client_with_resolver(1, Some(bootstrap_resolver.clone()));
let target_config = Arc::new(OdohConfigCache::new( let target_config = Arc::new(OdohConfigCache::new(
odoh.target_host.clone(), odoh.target_host.clone(),
client.clone(), client.clone(),
@@ -110,7 +136,11 @@ pub async fn run(config_path: String) -> crate::Result<()> {
let fallback = if odoh.strict { let fallback = if odoh.strict {
Vec::new() Vec::new()
} else { } else {
parse_upstream_list(&config.upstream.fallback, config.upstream.port)? parse_upstream_list(
&config.upstream.fallback,
config.upstream.port,
Some(bootstrap_resolver.clone()),
)?
}; };
let pool = UpstreamPool::new(primary, fallback); let pool = UpstreamPool::new(primary, fallback);
let label = pool.label(); let label = pool.label();
@@ -405,8 +435,9 @@ pub async fn run(config_path: String) -> crate::Result<()> {
if config.blocking.enabled && !blocklist_lists.is_empty() { if config.blocking.enabled && !blocklist_lists.is_empty() {
let bl_ctx = Arc::clone(&ctx); let bl_ctx = Arc::clone(&ctx);
let bl_lists = blocklist_lists.clone(); let bl_lists = blocklist_lists.clone();
let bl_resolver = bootstrap_resolver.clone();
tokio::spawn(async move { tokio::spawn(async move {
load_blocklists(&bl_ctx, &bl_lists).await; load_blocklists(&bl_ctx, &bl_lists, Some(bl_resolver.clone())).await;
// Periodic refresh // Periodic refresh
let mut interval = tokio::time::interval(Duration::from_secs(refresh_hours * 3600)); let mut interval = tokio::time::interval(Duration::from_secs(refresh_hours * 3600));
@@ -414,7 +445,7 @@ pub async fn run(config_path: String) -> crate::Result<()> {
loop { loop {
interval.tick().await; interval.tick().await;
info!("refreshing blocklists..."); info!("refreshing blocklists...");
load_blocklists(&bl_ctx, &bl_lists).await; load_blocklists(&bl_ctx, &bl_lists, Some(bl_resolver.clone())).await;
} }
}); });
} }
@@ -596,8 +627,8 @@ async fn network_watch_loop(ctx: Arc<ServerCtx>) {
} }
} }
async fn load_blocklists(ctx: &ServerCtx, lists: &[String]) { async fn load_blocklists(ctx: &ServerCtx, lists: &[String], resolver: Option<Arc<NumaResolver>>) {
let downloaded = download_blocklists(lists).await; let downloaded = download_blocklists(lists, resolver).await;
// Parse outside the lock to avoid blocking DNS queries during parse (~100ms) // Parse outside the lock to avoid blocking DNS queries during parse (~100ms)
let mut all_domains = std::collections::HashSet::new(); let mut all_domains = std::collections::HashSet::new();
@@ -632,8 +663,10 @@ async fn warm_domain(ctx: &ServerCtx, domain: &str) {
} }
async fn doh_keepalive_loop(ctx: Arc<ServerCtx>) { async fn doh_keepalive_loop(ctx: Arc<ServerCtx>) {
// First tick fires immediately so we surface bootstrap-resolver failures
// (unreachable Quad9/Cloudflare, blocked :53, bad upstream hostname) in
// the startup logs instead of on the first client query.
let mut interval = tokio::time::interval(Duration::from_secs(25)); let mut interval = tokio::time::interval(Duration::from_secs(25));
interval.tick().await; // skip first immediate tick
loop { loop {
interval.tick().await; interval.tick().await;
let pool = ctx.upstream_pool.lock().unwrap().clone(); let pool = ctx.upstream_pool.lock().unwrap().clone();

View File

@@ -0,0 +1,155 @@
#!/usr/bin/env bash
#
# Reproducer for issue #122 — chicken-and-egg when numa is its own system
# resolver (HAOS add-on, Pi-hole-style container, laptop with
# resolv.conf → 127.0.0.1).
#
# Topology:
# container /etc/resolv.conf → nameserver 127.0.0.1
# numa bound on :53 → upstream DoH by hostname (quad9)
# numa boots → spawns blocklist download
# reqwest::get → getaddrinfo("cdn.jsdelivr.net")
# → loopback UDP :53 → numa → cache miss → DoH upstream
# → getaddrinfo("dns.quad9.net") → same loop → glibc EAI_AGAIN
#
# Expected on master: both assertions FAIL (bug reproduced).
# Expected after bootstrap-IP fix: both assertions PASS.
#
# Requirements: docker (with internet access for external lists/DoH)
# Usage: ./tests/docker/self-resolver-loop.sh
set -euo pipefail
cd "$(dirname "$0")/../.."
GREEN="\033[32m"; RED="\033[31m"; RESET="\033[0m"
pass() { printf " ${GREEN}${RESET} %s\n" "$1"; }
fail() { printf " ${RED}${RESET} %s\n" "$1"; printf " %s\n" "$2"; FAILED=$((FAILED+1)); }
FAILED=0
OUT=/tmp/numa-self-resolver.out
echo "── self-resolver-loop: building + reproducing on debian:bookworm ──"
echo " (first run is slow: image pull + cold cargo build, ~5-8 min)"
echo
docker run --rm \
-v "$PWD:/src:ro" \
-v numa-self-resolver-cargo:/root/.cargo \
-v numa-self-resolver-target:/work/target \
debian:bookworm bash -c '
set -e
# Phase 1: install deps + build with the container DNS as given by Docker
# (resolves deb.debian.org, static.rust-lang.org, crates.io).
apt-get update -qq && apt-get install -y -qq curl build-essential dnsutils 2>&1 | tail -3
if ! command -v cargo &>/dev/null; then
curl -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal --quiet
fi
. "$HOME/.cargo/env"
mkdir -p /work
tar -C /src --exclude=./target --exclude=./.git -cf - . | tar -C /work -xf -
cd /work
echo "── cargo build --release --locked ──"
cargo build --release --locked 2>&1 | tail -5
echo
# Phase 2: flip system DNS to numa itself — this is the pathological
# topology from issue #122 (HAOS add-on, resolv.conf → 127.0.0.1).
# Everything after this point, any getaddrinfo call inside numa loops
# back through :53.
echo "nameserver 127.0.0.1" > /etc/resolv.conf
echo "── /etc/resolv.conf inside container (post-flip) ──"
cat /etc/resolv.conf
echo
cat > /tmp/numa.toml <<CONF
[server]
bind_addr = "0.0.0.0:53"
api_port = 5380
api_bind_addr = "127.0.0.1"
data_dir = "/tmp/numa-data"
[upstream]
mode = "forward"
address = ["https://dns.quad9.net/dns-query"]
timeout_ms = 3000
[blocking]
enabled = true
lists = ["https://cdn.jsdelivr.net/gh/hagezi/dns-blocklists@latest/hosts/pro.txt"]
CONF
mkdir -p /tmp/numa-data
echo "── starting numa ──"
RUST_LOG=info ./target/release/numa /tmp/numa.toml > /tmp/numa.log 2>&1 &
NUMA_PID=$!
# Wait up to 120s for blocklist to populate.
# Retry delays 2+10+30s = 42s, plus ~4 × ~10s getaddrinfo timeouts under
# self-loop = ~82s worst case. 120s leaves headroom.
LOADED=0
for i in $(seq 1 120); do
LOADED=$(curl -sf http://127.0.0.1:5380/blocking/stats 2>/dev/null \
| grep -o "\"domains_loaded\":[0-9]*" | cut -d: -f2 || echo 0)
[ "${LOADED:-0}" -gt 100 ] && break
sleep 1
done
# First cold DoH query — time it.
START=$(date +%s%N)
dig @127.0.0.1 example.com A +time=15 +tries=1 > /tmp/dig.out 2>&1 || true
END=$(date +%s%N)
LATENCY_MS=$(( (END - START) / 1000000 ))
STATUS=$(grep -oE "status: [A-Z]+" /tmp/dig.out | head -1 || echo "status: TIMEOUT")
kill $NUMA_PID 2>/dev/null || true
wait $NUMA_PID 2>/dev/null || true
echo
echo "=== RESULT ==="
echo "domains_loaded=$LOADED"
echo "first_query_latency_ms=$LATENCY_MS"
echo "first_query_${STATUS// /_}"
echo
echo "=== numa.log (tail 40) ==="
tail -40 /tmp/numa.log
echo
echo "=== dig.out ==="
cat /tmp/dig.out
' 2>&1 | tee "$OUT"
echo
echo "── assertions ──"
LOADED=$(grep '^domains_loaded=' "$OUT" | tail -1 | cut -d= -f2 || echo 0)
LATENCY=$(grep '^first_query_latency_ms=' "$OUT" | tail -1 | cut -d= -f2 || echo 999999)
STATUS_LINE=$(grep '^first_query_status_' "$OUT" | tail -1 || echo "first_query_status_TIMEOUT")
if [ "${LOADED:-0}" -gt 100 ]; then
pass "blocklist downloaded (domains_loaded=$LOADED)"
else
fail "blocklist downloaded (got domains_loaded=${LOADED:-0}, expected >100)" \
"chicken-and-egg: blocklist HTTPS client has no DNS bootstrap; getaddrinfo loops through numa"
fi
if [ "${LATENCY:-999999}" -lt 2000 ]; then
pass "first DoH query under 2s (latency=${LATENCY}ms, $STATUS_LINE)"
else
fail "first DoH query under 2s (got ${LATENCY}ms, $STATUS_LINE)" \
"self-loop on getaddrinfo(upstream_host); plain DoH needs bootstrap-IP symmetry with ODoH"
fi
echo
if [ "$FAILED" -eq 0 ]; then
printf "${GREEN}── self-resolver-loop passed (fix is in place) ──${RESET}\n"
exit 0
else
printf "${RED}── self-resolver-loop failed ($FAILED assertion(s)) — bug #122 reproduced ──${RESET}\n"
exit 1
fi

View File

@@ -975,6 +975,50 @@ check "Same-host relay+target rejected at startup" \
"same host" \ "same host" \
"$STARTUP_OUT" "$STARTUP_OUT"
# Guards ODoH's zero-plain-DNS-leak property: relay_ip / target_ip must
# land in the bootstrap resolver's override map so reqwest connects direct
# to the configured IPs instead of resolving the hostnames via plain DNS.
# RFC 5737 TEST-NET-1 IPs (unroutable).
cat > "$CONFIG" << 'CONF'
[server]
bind_addr = "127.0.0.1:5354"
api_port = 5381
[upstream]
mode = "odoh"
relay = "https://odoh-relay.example.com/proxy"
target = "https://odoh-target.example.org/dns-query"
relay_ip = "192.0.2.1"
target_ip = "192.0.2.2"
[cache]
max_entries = 10000
[blocking]
enabled = false
[proxy]
enabled = false
CONF
RUST_LOG=info "$BINARY" "$CONFIG" > "$LOG" 2>&1 &
NUMA_PID=$!
for _ in $(seq 1 30); do
curl -sf "http://127.0.0.1:$API_PORT/health" >/dev/null 2>&1 && break
sleep 0.1
done
OVERRIDE_LOG=$(grep 'bootstrap resolver: host overrides' "$LOG" || true)
check "relay_ip wired into bootstrap override map" \
"odoh-relay.example.com=192.0.2.1" \
"$OVERRIDE_LOG"
check "target_ip wired into bootstrap override map" \
"odoh-target.example.org=192.0.2.2" \
"$OVERRIDE_LOG"
kill "$NUMA_PID" 2>/dev/null || true
wait "$NUMA_PID" 2>/dev/null || true
fi # end Suite 8 fi # end Suite 8
# ---- Suite 9: Numa's own ODoH relay (--relay-mode) ---- # ---- Suite 9: Numa's own ODoH relay (--relay-mode) ----