resolve_query now returns (BytePacketBuffer, QueryPath) so callers
and tests can inspect the resolution path without reading the query
log. Production call sites (UDP, DoT, DoH) destructure and ignore it.
The forwarding test now uses a mock UDP upstream that replies with a
canned response, asserting QueryPath::Forwarded instead of != Local.
Thread Transport enum through resolve pipeline, record per-query
transport in stats and query log. Dashboard gets bar chart panel
with encryption %, transport column in query log, and filter dropdown.
- Extract refresh_entry in ctx.rs — warm_domain in main.rs now delegates
to it instead of duplicating the resolve+cache logic (~40 lines removed)
- Eliminate unconditional .to_vec() of raw wire on every UDP/DoT query —
pass &buffer.buf[..len] directly (zero-cost for cache hits)
- Replace bare bool stale flag with Freshness enum (Fresh/NearExpiry/Stale)
making the three states self-documenting at every call site
Multiple stale queries for the same domain now spawn only one background
refresh. A HashSet<(String, QueryType)> on ServerCtx tracks in-flight
refreshes; subsequent stale hits for the same key skip the spawn.
When a cached entry is expired but within the 1-hour stale window,
serve it immediately with TTL=1 AND spawn a background re-resolve.
The next query gets a fresh entry instead of another stale serve.
Without this, stale entries were served repeatedly for up to an hour
with no refresh — effectively ignoring TTL.
Wire-level forwarding path skips DnsPacket parse/serialize on the hot
path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches
ID + TTLs in-place on lookup instead of cloning parsed packets.
Request hedging (Dean & Barroso "Tail at Scale") fires a second
parallel request after a configurable delay (default 10ms) when
the primary upstream stalls. DoH keepalive loop prevents idle
HTTP/2 + TLS connection teardown.
Recursive resolver now hedges across multiple NS addresses and
caches NS delegation records to skip TLD re-queries.
Integration test harness polls /blocking/stats instead of fixed
sleep, eliminating the blocklist-download race condition.
* feat: multi-forwarder with SRTT-based failover
address accepts string or array, with optional per-server port override.
New fallback pool tried only when all primaries fail. Sequential failover
with SRTT ranking ensures fastest upstream is tried first.
Closes#34 (items 1, 2, 3)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: simplify failover candidate list and deduplicate recursive pool
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: extract maybe_update_primary for testable upstream re-detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: rustfmt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: scope mobileconfig DNS to Wi-Fi only via OnDemandRules
Without OnDemandRules, iOS applies the DoT profile globally —
cellular DNS breaks when the phone leaves the LAN.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: phone setup QR code in dashboard header
- Add /qr endpoint serving SVG (uses existing qrcode crate, svg feature)
- Header popover: QR on desktop, direct download link on mobile viewports
- Only visible when [mobile] enabled = true in config
- Expose mobile.enabled and mobile.port in /stats response
- Lazy-load QR on first click, dismiss on outside click
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add Cache-Control to /qr, re-fetch QR on each popover open
Cache-Control: no-store prevents stale QR after LAN IP change.
Remove qrLoaded flag so the QR always reflects the current IP.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: rustfmt serve_qr response tuple
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add iOS install steps to phone setup popover
iOS shows "Profile Downloaded" with no guidance. The popover
now includes the 3-step install flow including the buried
Certificate Trust Settings toggle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: numa setup-phone — QR-based mobile DoT onboarding
Adds a CLI subcommand that generates a one-time mobileconfig profile
containing both the Numa local CA (as a com.apple.security.root payload)
and the DoT DNS settings, then serves it via a temporary HTTP server
and prints a scannable QR code in the terminal.
Flow:
1. User runs `numa setup-phone` (no sudo needed)
2. Detects current LAN IP, reads CA from /usr/local/var/numa/ca.pem
3. Builds combined mobileconfig (CA trust + DoT)
4. Renders QR code with qrcode crate (Unicode block characters)
5. Serves the profile on port 8765, stays open until Ctrl+C
6. Counts successful downloads (multi-device households)
Important caveat documented in instructions: even with the CA bundled
in the profile, iOS still requires the user to manually enable trust
in Settings → General → About → Certificate Trust Settings. Verified
on a real iPhone.
Stable PayloadIdentifiers/UUIDs ensure re-running replaces the
existing profile on iOS rather than accumulating duplicates.
- New module: src/setup_phone.rs (~270 lines)
- New CLI subcommand: `numa setup-phone`
- New dependency: qrcode = "0.14" (default-features = false)
- tokio "signal" feature added for Ctrl+C handling
- 3 unit tests: PEM stripping, mobileconfig generation, QR rendering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: mobile API, enriched /health, mobileconfig module
Adds a persistent read-only HTTP listener (default port 8765, LAN-bound)
serving a dedicated subset of Numa's API for iOS/Android companion apps
and as a replacement for the one-shot server setup_phone used to spin up:
GET /health — enriched JSON with version, hostname, LAN IP,
SNI, DoT config, mobile API port, CA
fingerprint, features (shared handler with
the main API on port 5380)
GET /ca.pem — public CA certificate (shared handler)
GET /mobileconfig — full iOS profile (CA trust + DNS settings
pinned to current LAN IP)
GET /ca.mobileconfig — CA-only iOS profile (trust anchor without
DNS override — for the iOS companion app's
programmatic DNS flow via NEDNSSettingsManager)
All routes are idempotent GETs. The mobile API never serves the
state-mutating routes that live on the main API (overrides, blocking
toggle, service CRUD, cache flush), so it is safe to expose on the LAN
regardless of the main API's bind address. The CA private key is never
served by any route.
Opt-in via `[mobile] enabled = true`. Default is false so new installs
do not silently expose a LAN listener after upgrading; our committed
numa.toml template enables it explicitly for spike testing.
New modules:
- src/mobileconfig.rs — ProfileMode::{Full, CaOnly} enum with plist
builder lifted from setup_phone.rs. Full and CaOnly share the CA
payload UUID (same trust anchor) but have distinct top-level UUIDs
so they coexist as separate installable profiles on iOS.
- src/health.rs — HealthMeta cached metadata built once at startup
from config + CA fingerprint (SHA-256 of the PEM via ring), and the
HealthResponse JSON shape shared between the main and mobile APIs.
- src/mobile_api.rs — axum Router for the persistent listener. Reuses
api::health and api::serve_ca from the main API; owns the two
mobileconfig handlers.
Modified:
- src/api.rs — health() returns the enriched HealthResponse, now pub.
serve_ca is now pub so mobile_api can reuse it.
- src/config.rs — MobileConfig section (enabled, port, bind_addr).
- src/ctx.rs — health_meta: HealthMeta field on ServerCtx.
- src/main.rs — builds HealthMeta at startup, spawns mobile API
listener if enabled.
- src/lan.rs — build_announcement takes &HealthMeta and writes
enriched TXT records (version, api_port, proto, dot_port, ca_fp).
SRV port now reports the mobile API port; peer discovery still
reads TXT `services=` so this is backwards compatible. Always
announces even when no .numa services are registered, so the iOS
companion app can discover Numa via mDNS regardless of service
state.
- src/setup_phone.rs — reduced from 267 to 100 lines. The CLI is now
a thin QR wrapper over the persistent /mobileconfig endpoint; the
hand-rolled one-shot HTTP server (accept_loop, RUST_OK_HEADERS,
RUST_NOT_FOUND, download counter) is gone.
- src/dot.rs — test fixture updated with HealthMeta::test_fixture().
- numa.toml — commented [mobile] section, enabled = true for spike.
Tests: 136 unit tests passing (5 new in mobileconfig, 3 new in health).
cargo clippy clean. Integration sanity check: curl'd /health, /ca.pem,
/mobileconfig, /ca.mobileconfig against a running numa — all return
200 with correct content types and valid response bodies.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: setup-phone probe, unknown command error, query source in dashboard
- setup-phone now probes the mobile API before printing the QR code
and shows an actionable error if [mobile] is not enabled
- Unknown CLI subcommands print an error instead of silently
attempting to start a full server
- Dashboard query log shows source IP under timestamp (localhost
for loopback, full IP for LAN devices) with full addr on hover
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts the NUMA_DATA_DIR env var added in the previous commit and
replaces it with a [server] data_dir TOML field. Numa already has a
well-developed config system; adding a parallel env-var mechanism
for a single knob was wrong.
The principle: TOML is for application behavior configuration. Env
vars are for bootstrap values (HOME, SUDO_USER to discover paths
before config loads) and standard ecosystem conventions (RUST_LOG).
data_dir is neither — it's an app knob, so it belongs in the TOML.
Changes:
- lib.rs::data_dir() reverts to the platform-specific fallback only
- config.rs adds `data_dir: Option<PathBuf>` to ServerConfig
- main.rs resolves config.server.data_dir with fallback to
numa::data_dir() and passes it to build_tls_config, then stores the
resolved path on ctx.data_dir for downstream consumers
- tls.rs::build_tls_config takes `data_dir: &Path` as an explicit
parameter instead of calling crate::data_dir() behind the caller's
back. regenerate_tls and dot.rs self_signed_tls now pass
&ctx.data_dir, honoring whatever path the config resolved to
- tests/integration.sh Suite 6 uses `data_dir = "$NUMA_DATA"` in its
test TOML instead of the NUMA_DATA_DIR env var prefix
- numa.toml gains a commented-out data_dir example
No behavior change for existing production deployments (the default
path is unchanged). Test harness is now fully config-driven, and
containerized deploys can override data_dir via mount+config without
needing env var injection.
127/127 unit tests pass, Suite 6 passes end-to-end.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds tests/integration.sh Suite 5 (DoT via kdig + openssl) and
fixes a startup panic caught by it.
Bug: when [dot] cert_path/key_path was set AND [proxy] was disabled,
numa panicked on the first DoT handshake with "Could not
automatically determine the process-level CryptoProvider from Rustls
crate features". In normal deployments the proxy's build_tls_config
installs the default provider as a side effect, masking the missing
call in dot.rs::load_tls_config. Disable the proxy and the panic
surfaces. Fix: call
rustls::crypto::ring::default_provider().install_default() at the
top of load_tls_config (no-op if already installed).
Suite 5 exercises:
- DoT listener binds on configured port
- Resolves a local zone A record over TLS (kdig +tls)
- Persistent connection reuse (kdig +keepopen, 3 queries, 1 handshake)
- ALPN "dot" negotiation (openssl s_client -alpn dot)
- ALPN mismatch rejected with no_application_protocol (openssl -alpn h2)
Uses a pre-generated cert at /tmp so the test runs non-root.
Skips gracefully if kdig or openssl aren't installed.
Also: Dockerfile now EXPOSE 853/tcp so docker run -p 853:853 works
out of the box when users enable DoT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds dot_rejects_non_dot_alpn to assert the rustls server enforces
ALPN strictness rather than silently accepting a mismatched
negotiation. This is the load-bearing behavior behind the cross-
protocol confusion defense — without enforcement, the ALPN "dot"
advertisement is just a sign hung on an unlocked door.
Refactors test_tls_configs to return the leaf cert DER instead of a
prebuilt client config, and adds a dot_client(cert_der, alpn) helper
so each test can build a client config with the ALPN list it needs.
The five existing DoT tests gain one line each to call dot_client
with dot_alpn(); behavior unchanged.
127/127 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
self_signed_tls was passing an empty service_names list, so the
generated cert only had the *.numa wildcard SAN. Strict TLS clients
(browsers, possibly some iOS versions) reject wildcards under
single-label TLDs — see the existing comment in tls.rs explaining
why the proxy lists each service explicitly.
setup-phone's mobileconfig sends ServerName "numa.numa" as SNI, so
the DoT cert must have an explicit numa.numa SAN. Pass proxy_tld
itself as a service name, mirroring how main.rs already registers
"numa" as a service for the proxy's TLS cert.
Test fixture updated to mirror the production SAN shape (*.numa +
numa.numa) and switched the client to SNI "numa.numa", so the
existing DoT test suite implicitly exercises the SNI path used by
setup-phone clients.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both were restating what the code already said — dot_alpn's doc
narrated the function name and the test comment restated the
assertion. RFC 7858 §3.2 is already cited on self_signed_tls and
build_tls_config where the "why" actually matters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two DoS/interop hardening items:
1. Bound write_framed by WRITE_TIMEOUT (10s) so a slow-reader
attacker can't indefinitely hold a worker task and its connection
permit. Symmetric to the existing handshake timeout.
2. Advertise ALPN "dot" per RFC 7858 §3.2. Required by some strict
DoT clients (newer Apple stacks, some Android versions). rustls
ServerConfig exposes alpn_protocols as a pub field so we set it
after with_single_cert:
- load_tls_config (user-provided cert/key): set directly
- self_signed_tls (new, replaces fallback_tls): builds a fresh
DoT-specific TLS config via build_tls_config with the ALPN list
build_tls_config now takes an `alpn: Vec<Vec<u8>>` parameter so
DoT and the proxy can pass different ALPN lists while sharing the
same CA. Proxy callers pass Vec::new() (unchanged behavior).
Dropped the ctx.tls_config reuse branch: we can't mutate a shared
Arc<ServerConfig> to add DoT-specific ALPN, and reusing the proxy
config was already quietly broken re: SAN (proxy cert covers
*.{tld}, not the DoT server's bind hostname/IP).
Added dot_negotiates_alpn test that asserts conn.alpn_protocol()
returns Some(b"dot") after handshake. 126/126 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Collapse two 4-arm read/timeout matches to let-else (lose one
defensive debug log on payload-read timeout; idle timeouts are
routine on persistent DoT connections anyway)
- Drop MIN_MSG_LEN: DnsPacket::from_buffer rejects truncated input
on its own, and BytePacketBuffer is zero-init so buf[0..2] for
sub-2-byte messages just yields a harmless FORMERR with id=0
- Inline ACCEPT_ERROR_BACKOFF (single use site)
- Drop the partial cert/key warning: missing one of cert_path/
key_path silently falls back to self-signed; users see the
self-signed cert at startup and figure it out
- Drop dot_localhost_resolution test: RFC 6761 localhost is tested
in ctx.rs; this test only verified DoT transport, which
dot_resolves_local_zone already covers
- Drop self-documenting comment in dot_multiple_queries_on_persistent_connection
Net -32 lines, 125/125 tests pass, no behavior change users would notice.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Flatten 4-arm cert/key match in start_dot to 2 arms with the
partial-config warning hoisted into a one-liner above the match.
- Extract send_response() that serializes a DnsPacket and writes it
framed, used by both the FORMERR-on-parse-error and SERVFAIL-on-
resolve-error paths. Removes duplicated buffer/write/log boilerplate
and unifies the rescode logging via {:?}.
No behavior change; 126/126 tests still pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review findings on PR #25:
- Refactor resolve_query to take a pre-parsed DnsPacket. Parse-error
handling moves to the UDP caller, eliminating the double warn! line
on malformed UDP queries.
- Enforce MIN_MSG_LEN=12 (DNS header) in handle_dot_connection so
query_id extraction is always reading client-sent bytes, not the
zeroed buffer tail.
- Parse the DoT query before calling resolve_query and retain it, so
SERVFAIL responses can echo the original question section via
response_from(). Parse failures send FORMERR with the client id.
- Extract write_framed() helper for length-prefix + flush, reused by
success, SERVFAIL, and FORMERR paths.
- Back off 100ms on listener.accept() errors to avoid tight-looping
on fd exhaustion.
- Replace the hardcoded 127.0.0.1:53 upstream in dot_nxdomain_for_unknown
with a bound-but-unresponsive UDP socket owned by the test, making it
independent of the host's local resolver. Test now runs in ~220ms
(timeout lowered to 200ms) instead of 3s and asserts the question is
echoed in the SERVFAIL response.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 10s timeout on TLS handshake — prevents clients from holding a
semaphore permit without completing the handshake
- Add IDLE_TIMEOUT on payload read_exact — prevents slowloris after
sending a valid length prefix then trickling bytes
- Extract accept_loop() shared between start_dot and tests — eliminates
duplicated accept logic that could drift
- Add 5s timeout on TCP reads in recursive test mock server
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Send SERVFAIL response (with correct query ID) when resolve_query
fails, preventing DoT clients from hanging until idle timeout
- Extract handle_dot_connection() so tests use the same logic as
production, eliminating duplicated accept/read/resolve loop
- Replace magic 4096 with named MAX_MSG_LEN constant tied to BUF_SIZE
- Add flush() after each TLS write to prevent buffered responses
- Extract fallback_tls() helper, handle partial cert/key config,
support IPv6 bind address, remove redundant crypto provider init
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Refactor handle_query into transport-agnostic resolve_query that returns
a BytePacketBuffer, keeping the UDP path zero-alloc. Add a TLS listener
on port 853 with persistent connections, idle timeout, connection limits,
and coalesced writes. Supports user-provided certs or self-signed CA
fallback. Includes 5 integration tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>