Commit Graph

57 Commits

Author SHA1 Message Date
Razvan Dimescu
459395203d style: cargo fmt 2026-04-21 16:30:26 +03:00
Razvan Dimescu
31adc31c9b refactor(ctx): coalesce forward-path upstream queries
resolve_coalesced now takes leader_path: QueryPath and applies to all
three upstream branches (Forwarded-rule, Recursive, Upstream), not just
Recursive. Fixes thundering-herd at boot when N concurrent HTTPS setups
each trigger independent forward queries for the same upstream hostname.
2026-04-21 16:18:52 +03:00
Razvan Dimescu
241c40553b feat(odoh): ship ODoH client + self-hosted relay (RFC 9230)
Client (mode = "odoh"): URL-query target routing per RFC 9230 §5,
/.well-known/odohconfigs TTL cache with 60s backoff on failure, HPKE
seal/open via odoh-rs, strict-mode default that SERVFAILs on relay
failure instead of silently downgrading. Host-equality config
validation rejects same-operator relay/target pairs.

Relay (`numa relay [PORT]`): axum server with /relay + /health.
SSRF-hardened hostname validator (RFC 1035 ASCII + dot + dash),
4 KiB body cap at the axum layer, 5s full-transaction timeout, and
static 502 on target failure (reqwest internals logged, not leaked).
Aggregate counters only — no per-request logs.

Observability: new `UpstreamTransport { Udp, Doh, Dot, Odoh }`
orthogonal to `QueryPath`, so /stats can tally wire protocols
symmetrically. Recursive mode records `Some(Udp)` for honest
"bytes egressing in cleartext" accounting.

Tests: Suite 8 exercises the client end-to-end via Frank Denis's
public relay + Cloudflare target; Suite 9 exercises `numa relay`
forwarding + guards against Cloudflare as the real far end. Full
probe script at tests/probe-odoh-ecosystem.sh verifies the entire
public ODoH ecosystem (4 targets + 1 relay per DNSCrypt's curated
list — confirms deploying Numa's relay doubles global supply).
2026-04-20 12:34:04 +03:00
Razvan Dimescu
24610ae3fe feat(question): add SVCB, LOC, NAPTR variants to QueryType
Logs were printing UNKNOWN(64), UNKNOWN(29), UNKNOWN(35) for SVCB,
LOC, and NAPTR — three RR types that have been registered for years
and show up in the wild (notably SVCB via RFC 9462 DDR clients
querying _dns.resolver.arpa).

Adds the variants and replaces the SVCB_QTYPE u16 const introduced
in #119 with QueryType::SVCB.to_num(), matching the HTTPS path.

Closes #114.
2026-04-19 07:49:35 +03:00
Razvan Dimescu
f9e996ae78 fmt: drop redundant comments per house style 2026-04-19 06:53:47 +03:00
Razvan Dimescu
5e85b147b9 feat(resolver): apply ipv6hint strip to SVCB (type 64) too
HTTPS (65) and SVCB (64) share the RDATA wire format, so the existing
parser already handles both — only the call site was HTTPS-only. Widen
the qtype check and extend the existing pipeline test with a second
query for SVCB.
2026-04-19 06:52:30 +03:00
Razvan Dimescu
d6bb9a0f01 fmt: rustfmt vec literal wrapping + signature collapse 2026-04-19 06:24:54 +03:00
Razvan Dimescu
61ea2e510d refactor: dedupe HTTPS_TYPE, record-walk, and test rdata builder
- Drop `const HTTPS_TYPE: u16 = 65;` in favor of `QueryType::HTTPS.to_num()`
  at the single call site — avoids a fresh magic number alongside the
  existing enum mapping in question.rs.
- Add `DnsPacket::for_each_record_mut` so `strip_https_ipv6_hints` stops
  hand-rolling the answers/authorities/resources walk; future section
  rewrites go through the same helper.
- Promote the SVCB test-rdata builder from `svcb::tests` to module scope
  as `pub(crate) #[cfg(test)] fn build_rdata`, and reuse it in the two
  pipeline tests in ctx.rs — kills ~20 lines of byte-fiddling and keeps
  one RDATA-construction code path.
2026-04-19 05:58:47 +03:00
Razvan Dimescu
22dd3cd222 fix(resolver): skip ipv6hint strip for DO-bit clients
Modifying HTTPS rdata invalidates any accompanying RRSIG, so a DNSSEC-
validating downstream would reject the response as Bogus. Gate the
strip on !client_do, matching the existing DNSSEC-records strip.

Adds a regression test that catches the gate being removed: builds a
query with EDNS DO=1, asserts the HTTPS rdata round-trips untouched.
2026-04-19 05:52:37 +03:00
Razvan Dimescu
be98a02e49 feat(resolver): filter_aaaa for IPv4-only networks (#112)
When enabled, AAAA queries short-circuit to NODATA (NOERROR + empty
answer) so Happy Eyeballs clients don't stall waiting on a v6 address
they can't use. Also strips `ipv6hint` SvcParam from HTTPS/SVCB
answers (RFC 9460) so Chrome ≥103, Firefox, and Safari don't bypass
the AAAA filter via the HTTPS record path.

Local data is preserved: overrides, zones, the .numa proxy, and the
blocklist sinkhole keep whatever v6 addresses they configure — the
filter only kicks in on the cache/forward/recursive path. NODATA is
correct per RFC 2308 here; NXDOMAIN would incorrectly imply the name
doesn't exist for A queries either.

Off by default. Opt in via `filter_aaaa = true` under `[server]`.
2026-04-18 19:52:06 +03:00
Razvan Dimescu
4afc56a052 Merge main into feat/forwarding-array-upstream
Resolves conflict in src/ctx.rs — both sides added independent tokio
tests (forwarding fail-over on this branch, default-pool upstream path
on main from #103). Keep both.
2026-04-15 21:28:04 +03:00
Razvan Dimescu
fef43635d6 fix(ci): rustfmt import order and gate Upstream import for Windows 2026-04-15 04:11:27 +03:00
Razvan Dimescu
9a0d586b13 feat: accept array of upstreams in [[forwarding]]
Mirrors `[upstream] address` — `upstream` accepts string or array
of strings, builds an `UpstreamPool` and routes queries through
`forward_with_failover_raw` so SRTT ordering and failover apply to
matched `[[forwarding]]` rules the same way they do for the default
pool.

Single-string rules keep their current behavior (one-element pool,
equivalent single-upstream path). Empty array errors at config load.

Addresses item 1 of issue #102. Plan: docs/102_item1.md.
2026-04-15 04:03:38 +03:00
Razvan Dimescu
ebb2a5db39 refactor: simplify upstream-path test — reuse pool mutex, drop narrating comment 2026-04-14 18:26:45 +03:00
Razvan Dimescu
e0e0f50838 feat: distinguish UPSTREAM vs FORWARD in logs and stats
Queries matching a [[forwarding]] suffix rule now log as FORWARD;
queries resolved via the default [upstream] pool log as UPSTREAM.
Previously both paths shared the FORWARD label, making it impossible
to tell from logs whether a rule matched.

Adds QueryPath::Upstream, a queries.upstream stats counter exposed
via /stats, plus a matching dashboard filter, bar, and path tag.

Closes part of #102.
2026-04-14 18:18:32 +03:00
Razvan Dimescu
b4b939c78b fix: accept tls:// and https:// in [[forwarding]] upstreams
Config-level forwarding rules were parsed with the UDP-only
`parse_upstream_addr` helper, silently rejecting the DoT/DoH schemes
that the rest of the forwarding pipeline already supports.

Widen `ForwardingRule.upstream` from `SocketAddr` to `Upstream` so
config rules reuse the same parser as `[upstream].address` and
`fallback`. Demote `parse_upstream_addr` to `pub(crate)` to prevent
the same mistake recurring.

Closes #100.
2026-04-14 09:22:24 +03:00
Razvan Dimescu
d3f046da4c style: assert loopback addr in subdomain test, trim verbose comment 2026-04-13 08:10:26 +03:00
Razvan Dimescu
0bdde40f40 test: verify forwarded response content from mock upstream 2026-04-13 08:07:58 +03:00
Razvan Dimescu
155c1c4da0 test: full-pipeline coverage for every resolve_query step
Test each pipeline stage in isolation through resolve_query:
- override takes precedence over all other paths
- localhost and *.localhost resolve to loopback
- local zone returns configured records
- .tld proxy resolves registered services to loopback
- blocklist sinkholes to 0.0.0.0
- cache hit returns stored response without upstream
2026-04-13 08:04:59 +03:00
Razvan Dimescu
b40004fe5e refactor: extract shared test infrastructure into testutil module
- test_ctx(): single ServerCtx builder, replaces 3 copies (ctx/api/dot)
- mock_upstream(): canned DNS response server for forwarding tests
- blackhole_upstream(): unresponsive socket for timeout tests
- Removes ~100 lines of duplicated 30-field struct literals
2026-04-13 07:56:47 +03:00
Razvan Dimescu
b8ddc16027 refactor: return QueryPath from resolve_query, add mock upstream to tests
resolve_query now returns (BytePacketBuffer, QueryPath) so callers
and tests can inspect the resolution path without reading the query
log. Production call sites (UDP, DoT, DoH) destructure and ignore it.

The forwarding test now uses a mock UDP upstream that replies with a
canned response, asserting QueryPath::Forwarded instead of != Local.
2026-04-13 07:51:14 +03:00
Razvan Dimescu
48f67be2f1 refactor: deduplicate test_ctx by delegating to test_ctx_with_forwarding 2026-04-13 07:39:55 +03:00
Razvan Dimescu
ca00846393 fix: forwarding rules override special-use NXDOMAIN for private PTR zones
Explicit [[forwarding]] rules now take precedence over the RFC 6303
special-use domain intercept. Previously, PTR queries for private
ranges (e.g. 168.192.in-addr.arpa) always returned local NXDOMAIN
even when a forwarding rule pointed them at a corporate DNS server.

Add full-pipeline resolve_query test harness (test_ctx + resolve_in_test)
and two tests covering both the default behavior and the override.

Closes #94
2026-04-13 07:36:53 +03:00
Razvan Dimescu
2101dfcf17 feat: transport protocol tracking (UDP/TCP/DoT/DoH) with dashboard visualization
Thread Transport enum through resolve pipeline, record per-query
transport in stats and query log. Dashboard gets bar chart panel
with encryption %, transport column in query log, and filter dropdown.
2026-04-12 22:14:26 +03:00
Razvan Dimescu
6d9ee14ea6 refactor: unify warm_stale/warm_domain, remove raw_wire alloc, add Freshness enum
- Extract refresh_entry in ctx.rs — warm_domain in main.rs now delegates
  to it instead of duplicating the resolve+cache logic (~40 lines removed)
- Eliminate unconditional .to_vec() of raw wire on every UDP/DoT query —
  pass &buffer.buf[..len] directly (zero-cost for cache hits)
- Replace bare bool stale flag with Freshness enum (Fresh/NearExpiry/Stale)
  making the three states self-documenting at every call site
2026-04-12 19:56:42 +03:00
Razvan Dimescu
3c49b0e65d fix: deduplicate background refresh with per-domain guard
Multiple stale queries for the same domain now spawn only one background
refresh. A HashSet<(String, QueryType)> on ServerCtx tracks in-flight
refreshes; subsequent stale hits for the same key skip the spawn.
2026-04-12 19:49:23 +03:00
Razvan Dimescu
571ce2f013 feat: background refresh on stale cache hit (RFC 8767 revalidation)
When a cached entry is expired but within the 1-hour stale window,
serve it immediately with TTL=1 AND spawn a background re-resolve.
The next query gets a fresh entry instead of another stale serve.

Without this, stale entries were served repeatedly for up to an hour
with no refresh — effectively ignoring TTL.
2026-04-12 19:42:56 +03:00
Razvan Dimescu
628ed00074 refactor: extract cache_and_parse, remove dead truncation log, restore TCP_TIMEOUT to 400ms 2026-04-12 18:40:46 +03:00
Razvan Dimescu
72b540a44a feat: wire-level cache, serve-stale, raw wire passthrough
- Cache stores raw DNS wire bytes + TTL offsets (2.4x memory reduction)
- Serve-stale (RFC 8767): expired entries returned with TTL=1 for 1hr
- handle_query captures raw_len from recv_from for zero-copy forwarding
- resolve_query accepts raw wire bytes, forwards without re-serializing
- wire.rs: TTL offset scanner, ID/TTL patching, question extraction
- 52 wire tests + 16 cache regression tests
2026-04-12 18:40:46 +03:00
Razvan Dimescu
7efac85836 feat: wire-level forwarding, cache, request hedging, and DoH keepalive
Wire-level forwarding path skips DnsPacket parse/serialize on the hot
path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches
ID + TTLs in-place on lookup instead of cloning parsed packets.

Request hedging (Dean & Barroso "Tail at Scale") fires a second
parallel request after a configurable delay (default 10ms) when
the primary upstream stalls. DoH keepalive loop prevents idle
HTTP/2 + TLS connection teardown.

Recursive resolver now hedges across multiple NS addresses and
caches NS delegation records to skip TLD re-queries.

Integration test harness polls /blocking/stats instead of fixed
sleep, eliminating the blocklist-download race condition.
2026-04-12 18:39:48 +03:00
Razvan Dimescu
7d6b0ed568 feat: DoH server endpoint + DoT enabled by default (#79)
* chore: document multi-forwarder and cache warming in config and README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: DNS-over-HTTPS server endpoint (RFC 8484)

Serve DoH at POST /dns-query on the existing HTTPS proxy (port 443).
Automatically enabled when proxy TLS is active — no config needed.
Also fix zone map priority so local zones override RFC 6762 .local
special-use handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove GoatCounter analytics from site

GoatCounter domains (goatcounter.com, gc.zgo.at) are blocked by
Hagezi Pro, which is Numa's default blocklist. A DNS privacy tool
should not embed analytics that its own resolver blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: enable DoT listener by default

DoT now starts automatically with `sudo numa`, matching the proxy and
DoH which are already on by default. The self-signed CA infrastructure
is shared with the proxy, so there is no additional setup. This makes
`numa setup-phone` work out of the box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 04:06:17 +03:00
Razvan Dimescu
8abcd91f95 feat: multi-forwarder with SRTT-based failover (#77)
* feat: multi-forwarder with SRTT-based failover

address accepts string or array, with optional per-server port override.
New fallback pool tried only when all primaries fail. Sequential failover
with SRTT ranking ensures fastest upstream is tried first.

Closes #34 (items 1, 2, 3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: simplify failover candidate list and deduplicate recursive pool

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract maybe_update_primary for testable upstream re-detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: rustfmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:26:58 +03:00
Razvan Dimescu
2c20c56421 feat: mobile setup — QR onboarding, Wi-Fi scoped mobileconfig (#73)
* fix: scope mobileconfig DNS to Wi-Fi only via OnDemandRules

Without OnDemandRules, iOS applies the DoT profile globally —
cellular DNS breaks when the phone leaves the LAN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: phone setup QR code in dashboard header

- Add /qr endpoint serving SVG (uses existing qrcode crate, svg feature)
- Header popover: QR on desktop, direct download link on mobile viewports
- Only visible when [mobile] enabled = true in config
- Expose mobile.enabled and mobile.port in /stats response
- Lazy-load QR on first click, dismiss on outside click

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add Cache-Control to /qr, re-fetch QR on each popover open

Cache-Control: no-store prevents stale QR after LAN IP change.
Remove qrLoaded flag so the QR always reflects the current IP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: rustfmt serve_qr response tuple

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add iOS install steps to phone setup popover

iOS shows "Profile Downloaded" with no guidance. The popover
now includes the 3-step install flow including the buried
Certificate Trust Settings toggle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 22:21:51 +03:00
Razvan Dimescu
de15b32325 feat: numa setup-phone — QR-based mobile DoT onboarding (#38)
* feat: numa setup-phone — QR-based mobile DoT onboarding

Adds a CLI subcommand that generates a one-time mobileconfig profile
containing both the Numa local CA (as a com.apple.security.root payload)
and the DoT DNS settings, then serves it via a temporary HTTP server
and prints a scannable QR code in the terminal.

Flow:
  1. User runs `numa setup-phone` (no sudo needed)
  2. Detects current LAN IP, reads CA from /usr/local/var/numa/ca.pem
  3. Builds combined mobileconfig (CA trust + DoT)
  4. Renders QR code with qrcode crate (Unicode block characters)
  5. Serves the profile on port 8765, stays open until Ctrl+C
  6. Counts successful downloads (multi-device households)

Important caveat documented in instructions: even with the CA bundled
in the profile, iOS still requires the user to manually enable trust
in Settings → General → About → Certificate Trust Settings. Verified
on a real iPhone.

Stable PayloadIdentifiers/UUIDs ensure re-running replaces the
existing profile on iOS rather than accumulating duplicates.

- New module: src/setup_phone.rs (~270 lines)
- New CLI subcommand: `numa setup-phone`
- New dependency: qrcode = "0.14" (default-features = false)
- tokio "signal" feature added for Ctrl+C handling
- 3 unit tests: PEM stripping, mobileconfig generation, QR rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mobile API, enriched /health, mobileconfig module

Adds a persistent read-only HTTP listener (default port 8765, LAN-bound)
serving a dedicated subset of Numa's API for iOS/Android companion apps
and as a replacement for the one-shot server setup_phone used to spin up:

  GET /health           — enriched JSON with version, hostname, LAN IP,
                          SNI, DoT config, mobile API port, CA
                          fingerprint, features (shared handler with
                          the main API on port 5380)
  GET /ca.pem           — public CA certificate (shared handler)
  GET /mobileconfig     — full iOS profile (CA trust + DNS settings
                          pinned to current LAN IP)
  GET /ca.mobileconfig  — CA-only iOS profile (trust anchor without
                          DNS override — for the iOS companion app's
                          programmatic DNS flow via NEDNSSettingsManager)

All routes are idempotent GETs. The mobile API never serves the
state-mutating routes that live on the main API (overrides, blocking
toggle, service CRUD, cache flush), so it is safe to expose on the LAN
regardless of the main API's bind address. The CA private key is never
served by any route.

Opt-in via `[mobile] enabled = true`. Default is false so new installs
do not silently expose a LAN listener after upgrading; our committed
numa.toml template enables it explicitly for spike testing.

New modules:

- src/mobileconfig.rs — ProfileMode::{Full, CaOnly} enum with plist
  builder lifted from setup_phone.rs. Full and CaOnly share the CA
  payload UUID (same trust anchor) but have distinct top-level UUIDs
  so they coexist as separate installable profiles on iOS.

- src/health.rs — HealthMeta cached metadata built once at startup
  from config + CA fingerprint (SHA-256 of the PEM via ring), and the
  HealthResponse JSON shape shared between the main and mobile APIs.

- src/mobile_api.rs — axum Router for the persistent listener. Reuses
  api::health and api::serve_ca from the main API; owns the two
  mobileconfig handlers.

Modified:

- src/api.rs — health() returns the enriched HealthResponse, now pub.
  serve_ca is now pub so mobile_api can reuse it.
- src/config.rs — MobileConfig section (enabled, port, bind_addr).
- src/ctx.rs — health_meta: HealthMeta field on ServerCtx.
- src/main.rs — builds HealthMeta at startup, spawns mobile API
  listener if enabled.
- src/lan.rs — build_announcement takes &HealthMeta and writes
  enriched TXT records (version, api_port, proto, dot_port, ca_fp).
  SRV port now reports the mobile API port; peer discovery still
  reads TXT `services=` so this is backwards compatible. Always
  announces even when no .numa services are registered, so the iOS
  companion app can discover Numa via mDNS regardless of service
  state.
- src/setup_phone.rs — reduced from 267 to 100 lines. The CLI is now
  a thin QR wrapper over the persistent /mobileconfig endpoint; the
  hand-rolled one-shot HTTP server (accept_loop, RUST_OK_HEADERS,
  RUST_NOT_FOUND, download counter) is gone.
- src/dot.rs — test fixture updated with HealthMeta::test_fixture().
- numa.toml — commented [mobile] section, enabled = true for spike.

Tests: 136 unit tests passing (5 new in mobileconfig, 3 new in health).
cargo clippy clean. Integration sanity check: curl'd /health, /ca.pem,
/mobileconfig, /ca.mobileconfig against a running numa — all return
200 with correct content types and valid response bodies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: setup-phone probe, unknown command error, query source in dashboard

- setup-phone now probes the mobile API before printing the QR code
  and shows an actionable error if [mobile] is not enabled
- Unknown CLI subcommands print an error instead of silently
  attempting to start a full server
- Dashboard query log shows source IP under timestamp (localhost
  for loopback, full IP for LAN devices) with full addr on hover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:08:56 +03:00
Razvan Dimescu
1239ed0e72 fix: parse DoT queries up-front and echo question in SERVFAIL
Address review findings on PR #25:

- Refactor resolve_query to take a pre-parsed DnsPacket. Parse-error
  handling moves to the UDP caller, eliminating the double warn! line
  on malformed UDP queries.
- Enforce MIN_MSG_LEN=12 (DNS header) in handle_dot_connection so
  query_id extraction is always reading client-sent bytes, not the
  zeroed buffer tail.
- Parse the DoT query before calling resolve_query and retain it, so
  SERVFAIL responses can echo the original question section via
  response_from(). Parse failures send FORMERR with the client id.
- Extract write_framed() helper for length-prefix + flush, reused by
  success, SERVFAIL, and FORMERR paths.
- Back off 100ms on listener.accept() errors to avoid tight-looping
  on fd exhaustion.
- Replace the hardcoded 127.0.0.1:53 upstream in dot_nxdomain_for_unknown
  with a bound-but-unresponsive UDP socket owned by the test, making it
  independent of the host's local resolver. Test now runs in ~220ms
  (timeout lowered to 200ms) instead of 3s and asserts the question is
  echoed in the SERVFAIL response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
aa8923b2c6 fix: add debug logging for DoT SERVFAIL serialization failure, TC-bit TODO
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
e4350ae81c feat: add DNS-over-TLS (DoT) listener (RFC 7858)
Refactor handle_query into transport-agnostic resolve_query that returns
a BytePacketBuffer, keeping the UDP path zero-alloc. Add a TLS listener
on port 853 with persistent connections, idle timeout, connection limits,
and coalesced writes. Supports user-provided certs or self-signed CA
fallback. Includes 5 integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
8c421b9fa3 fix: check forwarding rules before recursive resolution (#29)
Conditional forwarding (Tailscale .ts.net, VPC private zones) was
only checked in the forward mode branch. In recursive mode, queries
for forwarding-rule domains went to root servers instead of the
configured upstream, returning NXDOMAIN for private domains.

Move the forwarding rule check before the recursive/forward branch
so it takes priority regardless of mode.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 00:07:11 +03:00
Razvan Dimescu
ff1200eb10 feat: resolve .numa services to LAN IP for remote clients (#23)
* feat: resolve .numa services to LAN IP for remote clients

Remote DNS clients (e.g. phones on same WiFi) received 127.0.0.1 for
local .numa services, which is unreachable from their perspective.
Now returns the host's LAN IP when the query originates from a
non-loopback address. Also auto-widens proxy bind to 0.0.0.0 when
DNS is already public, and adds a startup warning when the proxy
remains localhost-only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: respect proxy bind_addr config, don't auto-widen

The auto-widen silently overrode an explicit config value — the user's
config should be the source of truth. Now the proxy always uses the
configured bind_addr, and the warning fires whenever it's 127.0.0.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update proxy bind_addr comment in example config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:15:42 +03:00
Razvan Dimescu
49535568d9 refactor: deduplicate query builders, record extraction, sinkhole records (#22)
- Add DnsPacket::query(id, domain, qtype) constructor; replace mock_query,
  make_query, and 4 inline constructions across ctx/forward/recursive/api
- Add record_to_addr() in recursive.rs; replace 4 identical A/AAAA match
  blocks with filter_map one-liners
- Add sinkhole_record() in ctx.rs; consolidate localhost and blocklist
  A/AAAA branching into single calls
- Remove now-unused DnsQuestion imports

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 14:22:07 +03:00
Razvan Dimescu
669498e85f refactor: extract resolve_coalesced, test real code (#21)
* refactor: extract resolve_coalesced, rewrite tests against real code

Extract Disposition enum, acquire_inflight(), and resolve_coalesced()
from handle_query so coalescing logic is independently testable. Rewrite
integration tests to call resolve_coalesced directly with mock futures
instead of fighting the iterative resolver's NS chain. All 12 coalescing
tests now exercise production code paths, not tokio primitives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: SERVFAIL echoes question section, preserve error messages

resolve_coalesced now takes &DnsPacket instead of query_id so SERVFAIL
responses use response_from (echoing question section per RFC). Error
messages preserved via Option<String> return for upstream error logging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 11:14:25 +03:00
Razvan Dimescu
30e46e549c feat: in-flight query coalescing with COALESCED path (#20)
* feat: in-flight query coalescing for recursive resolver

When multiple queries for the same (domain, qtype) arrive concurrently
and all miss the cache, only the first triggers recursive resolution.
Subsequent queries wait on a broadcast channel for the result.

Prevents thundering herd where N concurrent cache misses each
independently walk the full NS chain, compounding timeouts.

Uses InflightGuard (Drop impl) to guarantee map cleanup on
panic/cancellation — prevents permanent SERVFAIL poisoning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: add InflightMap type alias for clippy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add COALESCED query path and coalescing tests

Followers in the inflight coalescing path now log as COALESCED instead
of RECURSIVE, making it visible in the dashboard when queries were
deduplicated vs independently resolved. Adds 10 tests covering
InflightGuard cleanup, broadcast mechanics, and concurrent handle_query
coalescing through a mock TCP DNS server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract acquire_inflight, rewrite tests against real code

Move Disposition enum and inflight acquisition logic into a standalone
acquire_inflight() function. Rewrite 4 tests that were exercising tokio
primitives to call the real coalescing code path instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 10:36:02 +03:00
Razvan Dimescu
06d4e91cd2 feat: SRTT-based nameserver selection (#19)
* feat: SRTT-based nameserver selection for recursive resolver

BIND-style Smoothed RTT (EWMA) tracking per NS IP address. The resolver
learns which nameservers respond fastest and prefers them, eliminating
cascading timeouts from slow/unreachable IPv6 servers.

- New src/srtt.rs: SrttCache with record_rtt, record_failure, sort_by_rtt
- EWMA formula: new = (old * 7 + sample) / 8, 5s failure penalty, 5min decay
- TCP penalty (+100ms) lets SRTT naturally deprioritize IPv6-over-TCP
- Enabled flag embedded in SrttCache (no-op when disabled)
- Batch eviction (64 entries) for O(1) amortized writes at capacity
- Configurable via [upstream] srtt = true/false (default: true)
- Benchmark script: scripts/benchmark.sh (full, cold, warm, compare-all)
- Benchmarks show 12x avg improvement, 0% queries >1s (was 58%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: show DNSSEC and SRTT status in dashboard + API

Add dnssec and srtt boolean fields to /stats API response.
Display on/off indicators in the dashboard footer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply SRTT decay before EWMA so recovered servers rehabilitate

Without decay-before-EWMA, a server penalized at 5000ms stayed near
that value even after recovery — the stale raw penalty was used as the
EWMA base instead of the decayed estimate. Extract decayed_srtt()
helper and call it in record_rtt() before the smoothing step.

Also restores removed "why" comments in send_query / resolve_recursive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add install/upgrade instructions, smarter benchmark priming

README: document `numa install`, `numa service`, Homebrew upgrade,
and `make deploy` workflows. Benchmark: replace fixed `sleep 4` with
`wait_for_priming` that polls cache entry count for stability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 23:22:31 +02:00
Razvan Dimescu
71dbb138bc fix: return NXDOMAIN for .local queries instead of SERVFAIL (#18)
.local is reserved for mDNS (RFC 6762) and cannot be resolved by
upstream DNS servers. Add it to is_special_use_domain() so queries
like _grpc_config.localhost.local get an immediate NXDOMAIN instead
of timing out and returning SERVFAIL.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 22:42:33 +02:00
Razvan Dimescu
a84f2e7f1d feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation

Numa becomes a true DNS resolver — resolves from root nameservers
with complete DNSSEC chain-of-trust verification.

Recursive resolution:
- Iterative RFC 1034 from configurable root hints (13 default)
- CNAME chasing (depth 8), referral following (depth 10)
- A+AAAA glue extraction, IPv6 nameserver support
- TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs
- Config: mode = "recursive" in [upstream], root_hints, prime_tlds

DNSSEC (all 4 phases):
- EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020)
- DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write
- Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519
- Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326)
- DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK)
- RRSIG expiration/inception time validation
- NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial
- NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range
- Authority RRSIG verification for denial proofs
- Config: [dnssec] enabled/strict (default false, opt-in)
- AD bit on Secure, SERVFAIL on Bogus+strict
- DnssecStatus cached per entry, ValidationStats logging

Performance:
- TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY)
- Referral DS piggybacking from authority sections
- DNSKEY prefetch before validation loop
- Cold-cache validation: ~1 DNSKEY fetch (down from 5)
- Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns

Also:
- write_qname fix for root domain "." (was producing malformed queries)
- write_record_header() dedup, write_bytes() bulk writes
- DnsRecord::domain() + query_type() accessors
- UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const
- Real glue TTL (was hardcoded 3600)
- DNSSEC restricted to recursive mode only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: TCP fallback, query minimization, UDP auto-disable

Transport resilience for restrictive networks (ISPs blocking UDP:53):
- DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry
- UDP auto-disable: after 3 consecutive failures, switch to TCP-first
- IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6)
- Network change resets UDP detection for re-probing
- Root hint rotation in TLD priming

Privacy:
- RFC 7816 query minimization: root servers see TLD only, not full name

Code quality:
- Merged find_starting_ns + find_starting_zone → find_closest_ns
- Extracted resolve_ns_addrs_from_glue shared helper
- Removed overall timeout wrapper (per-hop timeouts sufficient)
- forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2)

Testing:
- Mock TCP-only DNS server for fallback tests (no network needed)
- tcp_fallback_resolves_when_udp_blocked
- tcp_only_iterative_resolution
- tcp_fallback_handles_nxdomain
- udp_auto_disable_resets
- Integration test suite (4 suites, 51 tests)
- Network probe script (tests/network-probe.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DNSSEC verified badge in dashboard query log

- Add dnssec field to QueryLogEntry, track validation status per query
- DnssecStatus::as_str() for API serialization
- Dashboard shows green checkmark next to DNSSEC-verified responses
- Blog post: add "How keys get there" section, transport resilience section,
  trim code blocks, update What's Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SVG shield for DNSSEC badge, update blog HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: NS cache lookup from authorities, UDP re-probe, shield alignment

- find_closest_ns checks authorities (not just answers) for NS records,
  fixing TLD priming cache misses that caused redundant root queries
- Periodic UDP re-probe every 5min when disabled — re-enables UDP
  after switching from a restrictive network to an open one
- Dashboard DNSSEC shield uses fixed-width container for alignment
- Blog post: tuck key-tag into trust anchor paragraph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TCP single-write, mock server consistency, integration tests

- TCP single-write fix: combine length prefix + message to avoid split
  segments that Microsoft/Azure DNS servers reject
- Mock server (spawn_tcp_dns_server) updated to use single-write too
- Tests: forward_tcp_wire_format, forward_tcp_single_segment_write
- Integration: real-server checks for Microsoft/Office/Azure domains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: recursive bar in dashboard, special-use domain interception

Dashboard:
- Add Recursive bar to resolution paths chart (cyan, distinct from Override)
- Add RECURSIVE path tag style in query log

Special-use domains (RFC 6761/6303/8880/9462):
- .localhost → 127.0.0.1 (RFC 6761)
- Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN
- _dns.resolver.arpa (DDR) → NXDOMAIN
- ipv4only.arpa (NAT64) → 192.0.0.170/171
- mDNS service discovery for private ranges → NXDOMAIN

Eliminates ~900ms SERVFAILs for macOS system queries that were
hitting root servers unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move generated blog HTML to site/blog/posts/, gitignore

- Generated HTML now in site/blog/posts/ (gitignored)
- CI workflow runs pandoc + make blog before deploy
- Updated all internal blog links to /blog/posts/ path
- blog/*.md remains the source of truth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — memory ordering, RRSIG time, NS resolution

- Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES
  (ARM correctness for cross-thread coordination)
- RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5)
  + 300s clock skew fudge factor (matches BIND)
- resolve_ns_addrs_from_glue collects addresses from ALL NS names,
  not just the first with glue (improves failover)
- is_special_use_domain: eliminate 16 format! allocations per
  .in-addr.arpa query (parse octet instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: API endpoint tests, coverage target

- 8 new axum handler tests: health, stats, query-log, overrides CRUD,
  cache, blocking stats, services CRUD, dashboard HTML
- Tests use tower::oneshot — no network, no server startup
- test_ctx() builds minimal ServerCtx for isolated testing
- `make coverage` target (cargo-tarpaulin), separate from `make all`
- 82 total tests (was 74)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 04:03:47 +02:00
Razvan Dimescu
962b400f4c perf: optimize DNS query hot path (#15)
* perf: optimize hot path — RwLock, inline filtering, pre-allocated strings

- Mutex → RwLock for cache, blocklist, and overrides (concurrent read access)
- Make cache.lookup() and overrides.lookup() take &self (read-only)
- Eliminate 3 Vec allocations per DnsPacket::write() via inline filtering
- Pre-allocate domain strings with capacity 64 in parse path
- Add criterion micro-benchmarks (hot_path + throughput)
- Add bench README documenting both benchmark suites

Measured improvement: ~14% faster parsing, ~9% pipeline throughput,
round-trip cached 733ns → 698ns (~2.3M queries/sec).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: simplify benchmark code after review

- Remove redundant DnsHeader::new() (already set by DnsPacket::new())
- Remove unused DnsHeader import
- Change simulate_cached_pipeline to take &DnsCache (lookup is &self now)
- Remove unnecessary mut on cache in cache_lookup_miss bench

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:01:08 +02:00
Razvan Dimescu
c5208e934d feat: DNS-over-HTTPS (DoH) upstream forwarding (#14)
* feat: DNS-over-HTTPS upstream forwarding

Encrypt upstream queries via DoH — ISPs see HTTPS traffic on port 443,
not plaintext DNS on port 53. URL scheme determines transport:
https:// = DoH, bare IP = plain UDP. Falls back to Quad9 DoH when
system resolver cannot be detected.

- Upstream enum (Udp/Doh) with Display and PartialEq
- BytePacketBuffer::from_bytes constructor
- reqwest http2 feature for DoH server compatibility
- network_watch_loop guards against DoH→UDP silent downgrade
- 5 new tests (mock DoH server, HTTP errors, timeout)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add DoH to README — Why Numa, comparison table, roadmap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 00:39:58 +02:00
Razvan Dimescu
e0c1997056 fix: regenerate TLS cert when services change (hot-reload via ArcSwap)
HTTPS proxy certs were generated once at startup. Services added at
runtime via API or LAN discovery got "not secure" in the browser
because their SAN wasn't in the cert. Now the cert is regenerated
on every service add/remove and swapped atomically via ArcSwap.
In-flight connections are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 16:14:06 +02:00
Razvan Dimescu
c6b35045d8 config visibility, PR review fixes, XSS hardening
Config visibility:
- startup banner shows config path, data dir, services path
- config search: ./numa.toml → ~/.config/numa/ → /usr/local/var/numa/
- /stats API exposes config_path and data_dir, dashboard footer renders them
- GET /ca.pem endpoint serves CA cert for cross-device TLS trust
- load_config returns ConfigLoad with found flag, warns on not-found
- ServerCtx stores PathBuf for config_dir/data_dir, string conversion at boundaries

PR review fixes:
- add explicit parens in resolve_route operator precedence (service_store.rs)
- hostname portability: drop -s flag, trim domain with split('.') (lan.rs)
- serve_ca uses spawn_blocking instead of sync fs::read in async handler
- load_config: remove TOCTOU exists() check, read directly and handle NotFound

XSS hardening:
- HTML-escape all user-controlled interpolations in dashboard (service names,
  route paths, ports, URLs, block check domain/reason)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 12:24:21 +02:00
Razvan Dimescu
41a97bb930 dashboard: show LAN status in Local Services panel header
- Add lan_enabled to ServerCtx
- Add lan field to /stats API (enabled, peer count)
- Dashboard shows "LAN off" (dim) or "LAN on · N peers" (green)
- Tooltip shows enable command or mDNS service type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:16:52 +02:00