fix(upstream): default hedge_ms=0 to avoid silent 2x upstream query count #135
Reference in New Issue
Block a user
Delete Branch "fix/hedge-default-off"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Hedging fires a second upstream query after the hedge delay. It's a genuine tail-latency rescue (packet loss, dispatch spikes, TLS handshake stalls) — but every lookup appears twice at the upstream provider. On quota'd DNS (NextDNS free tier, Control D, paid Quad9), the default
hedge_ms = 10silently halves the user's headroom.Surfaced by #134 (bcookatpcsd): single-address DoT upstream to NextDNS, dashboard showed every query duplicated. Not a protocol bug — hedging doing what it promises — but a surprising default.
This PR flips
default_hedge_ms()from10to0. Opt-in from here; users who want tail-latency rescue addhedge_ms = 10(or higher) explicitly.src/config.rs:454— default changed + short docstring explaining the 0 default and when to turn it back on.numa.toml— commented example shows new default, documents the 2× query-count trade-off so it's visible at config time.No API or config surface changes. No migration. Existing configs that set
hedge_ms = Xexplicitly are unchanged.Test plan
cargo checkclean.cargo test --lib config::tests— 51 passed.End-to-end manual test against a counting mock upstream (50ms response delay so hedging has 40ms to fire inside the 10ms window):
hedge_msline) → mock received 1 packet, numa answered in 56ms.hedge_ms = 10→ mock received 2 packets, numa answered in 50ms.Confirms the flip suppresses hedging for new users out-of-the-box while explicit opt-in still fires. Mock source + exact steps in the PR thread / linked gist (inline with the review if wanted).
CI green (
cargo audit,cargo test, integration).Follow-ups (not in this PR)
hedge_ms = 10if you want the pre-0.14.2 tail-latency behavior".hedge_ms = 0workaround they can apply to 0.14.1 today.