feat(odoh): accept multiple relay/target entries for rotation + failover #140

Open
opened 2026-04-23 17:34:57 +08:00 by razvandimescu · 0 comments
razvandimescu commented 2026-04-23 17:34:57 +08:00 (Migrated from github.com)

Context

Today [upstream] for mode = "odoh" takes a single relay and a single target:

[upstream]
mode = "odoh"
relay = "https://odoh-relay.numa.rs/relay"
target = "https://odoh.cloudflare-dns.com/dns-query"

This was raised in #138 - the natural extension to pools is not supported:

target = ["https://odoh.cloudflare-dns.com/dns-query", "https://odoh.crypto.sx/dns-query"]

The single-endpoint shape is fine for correctness but costs on two axes:

  • Resilience — a single relay outage = SERVFAIL in strict mode, or a silent privacy downgrade to fallback in non-strict mode.
  • Anonymity set / traffic-analysis — if you're one of a handful of clients on a relay, timing alone re-identifies you. Rotating across multiple relays dilutes the set; dnscrypt-proxy's anonymized-DNS does this by default.

RFC 9230 itself doesn't mandate 1:1 — the protocol is happy to carry any (relay, target) pair the client picks per query.

Proposed shape

Accept string | [string] for both relay and target, mirroring the existing address / fallback pattern in UpstreamConfig (see src/config.rs:166-173, the string_or_vec deserializer):

[upstream]
mode = "odoh"
relay  = ["https://odoh-relay.numa.rs/relay", "https://odoh-relay.edgecompute.app"]
target = ["https://odoh.cloudflare-dns.com/dns-query", "https://odoh.crypto.sx/dns-query"]

Selection policy:

  • SRTT-first picking per query, matching the existing address array behavior. Fastest healthy entry wins; dead entries get deprioritized.
  • Same-operator check runs per pair — still refuse (relay, target) pairs sharing an eTLD+1, but only reject configs where no valid pair exists.

Open questions

  1. Independent pools vs. explicit pairs. Independent pools (any relay × any target) are simpler to configure but may pick pairs the user doesn't trust (e.g., two US-based operators). An explicit [[odoh.routes]] table { relay, target } sidesteps this at the cost of verbosity.

  2. Bootstrap IP pinning (relay_ip / target_ip). These are single-valued today. With arrays, options are (a) per-URL map { "https://..." = "1.2.3.4" }, or (b) drop the per-URL pinning and require users who want pinning to use single-endpoint configs. Leaning (a) since pinning is the only way to keep the ODoH endpoint names off the bootstrap resolver — privacy-critical for recursive-self-resolver setups (see docs/implementation/bootstrap-resolver.md).

  3. Failure semantics in strict = true. Does "relay failure" mean a single pair failed, or the whole pool is exhausted? I'd argue exhausted - a single 5xx from one relay shouldn't kill the query if another relay is healthy.

## Context Today `[upstream]` for `mode = "odoh"` takes a single `relay` and a single `target`: ```toml [upstream] mode = "odoh" relay = "https://odoh-relay.numa.rs/relay" target = "https://odoh.cloudflare-dns.com/dns-query" ``` This was raised in #138 - the natural extension to pools is not supported: ```toml target = ["https://odoh.cloudflare-dns.com/dns-query", "https://odoh.crypto.sx/dns-query"] ``` The single-endpoint shape is fine for correctness but costs on two axes: - **Resilience** — a single relay outage = SERVFAIL in `strict` mode, or a silent privacy downgrade to `fallback` in non-strict mode. - **Anonymity set / traffic-analysis** — if you're one of a handful of clients on a relay, timing alone re-identifies you. Rotating across multiple relays dilutes the set; dnscrypt-proxy's anonymized-DNS does this by default. RFC 9230 itself doesn't mandate 1:1 — the protocol is happy to carry any `(relay, target)` pair the client picks per query. ## Proposed shape Accept `string | [string]` for both `relay` and `target`, mirroring the existing `address` / `fallback` pattern in `UpstreamConfig` (see `src/config.rs:166-173`, the `string_or_vec` deserializer): ```toml [upstream] mode = "odoh" relay = ["https://odoh-relay.numa.rs/relay", "https://odoh-relay.edgecompute.app"] target = ["https://odoh.cloudflare-dns.com/dns-query", "https://odoh.crypto.sx/dns-query"] ``` Selection policy: - **SRTT-first picking** per query, matching the existing `address` array behavior. Fastest healthy entry wins; dead entries get deprioritized. - **Same-operator check runs per pair** — still refuse `(relay, target)` pairs sharing an eTLD+1, but only reject configs where *no* valid pair exists. ## Open questions 1. **Independent pools vs. explicit pairs.** Independent pools (any relay × any target) are simpler to configure but may pick pairs the user doesn't trust (e.g., two US-based operators). An explicit `[[odoh.routes]]` table `{ relay, target }` sidesteps this at the cost of verbosity. 2. **Bootstrap IP pinning** (`relay_ip` / `target_ip`). These are single-valued today. With arrays, options are (a) per-URL map `{ "https://..." = "1.2.3.4" }`, or (b) drop the per-URL pinning and require users who want pinning to use single-endpoint configs. Leaning (a) since pinning is the only way to keep the ODoH endpoint names off the bootstrap resolver — privacy-critical for recursive-self-resolver setups (see `docs/implementation/bootstrap-resolver.md`). 3. **Failure semantics in `strict = true`.** Does "relay failure" mean a single pair failed, or the whole pool is exhausted? I'd argue exhausted - a single 5xx from one relay shouldn't kill the query if another relay is healthy.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dearsky/numa#140