fix(blocklist): retry on transient download failures (#122) #125

Merged
razvandimescu merged 2 commits from worktree-fix-blocklist-bootstrap into main 2026-04-21 00:22:05 +08:00
razvandimescu commented 2026-04-21 00:08:37 +08:00 (Migrated from github.com)

Summary

  • Closes #122. Blocklist downloads now survive numa's own cold-start resolution race — three retries with 2s/10s/30s backoff. By the second attempt, numa's upstream DoH connection is warm and getaddrinfo rounds-trip in <100ms.
  • Downloads run in parallel across lists via futures::future::join_all (different hosts, no shared-warming benefit from sequencing). Matches the pattern already used at src/api.rs:817.
  • Walk the full std::error::Error::source() chain when logging failures, so the reqwest warnings actually reveal what broke (TLS, DNS, connect refused) instead of the top-level "error sending request" blur the reporter hit.

Test plan

  • cargo test --lib blocklist:: — 13 tests pass, including two new ones:
    • retry_succeeds_after_transient_failure — flaky TCP listener drops the first 2 connections, serves the 3rd. Asserts the body comes through.
    • retry_gives_up_when_all_attempts_fail — more failures than retries, asserts None.
    • Both use &[0, 0, 0] delay schedule via the test-only fetch_with_retry_delays entrypoint; combined runtime ~20ms.
  • make all — full lint + 346 unit tests green.
  • ./tests/integration.sh release — all 99 integration tests pass across DNS resolution, caching, blocking, API, DoT, DoH, ODoH.
  • Manual HAOS verification by the reporter (@Guara92) — will ping on the issue once a release binary is available.
## Summary - Closes #122. Blocklist downloads now survive numa's own cold-start resolution race — three retries with 2s/10s/30s backoff. By the second attempt, numa's upstream DoH connection is warm and `getaddrinfo` rounds-trip in <100ms. - Downloads run in parallel across lists via `futures::future::join_all` (different hosts, no shared-warming benefit from sequencing). Matches the pattern already used at `src/api.rs:817`. - Walk the full `std::error::Error::source()` chain when logging failures, so the reqwest warnings actually reveal what broke (TLS, DNS, connect refused) instead of the top-level "error sending request" blur the reporter hit. ## Test plan - [x] `cargo test --lib blocklist::` — 13 tests pass, including two new ones: - `retry_succeeds_after_transient_failure` — flaky TCP listener drops the first 2 connections, serves the 3rd. Asserts the body comes through. - `retry_gives_up_when_all_attempts_fail` — more failures than retries, asserts `None`. - Both use `&[0, 0, 0]` delay schedule via the test-only `fetch_with_retry_delays` entrypoint; combined runtime ~20ms. - [x] `make all` — full lint + 346 unit tests green. - [x] `./tests/integration.sh release` — all 99 integration tests pass across DNS resolution, caching, blocking, API, DoT, DoH, ODoH. - [ ] Manual HAOS verification by the reporter (@Guara92) — will ping on the issue once a release binary is available.
Sign in to join this conversation.