Commit Graph

131 Commits

Author SHA1 Message Date
Razvan Dimescu
e0e0f50838 feat: distinguish UPSTREAM vs FORWARD in logs and stats
Queries matching a [[forwarding]] suffix rule now log as FORWARD;
queries resolved via the default [upstream] pool log as UPSTREAM.
Previously both paths shared the FORWARD label, making it impossible
to tell from logs whether a rule matched.

Adds QueryPath::Upstream, a queries.upstream stats counter exposed
via /stats, plus a matching dashboard filter, bar, and path tag.

Closes part of #102.
2026-04-14 18:18:32 +03:00
Razvan Dimescu
b4b939c78b fix: accept tls:// and https:// in [[forwarding]] upstreams
Config-level forwarding rules were parsed with the UDP-only
`parse_upstream_addr` helper, silently rejecting the DoT/DoH schemes
that the rest of the forwarding pipeline already supports.

Widen `ForwardingRule.upstream` from `SocketAddr` to `Upstream` so
config rules reuse the same parser as `[upstream].address` and
`fallback`. Demote `parse_upstream_addr` to `pub(crate)` to prevent
the same mistake recurring.

Closes #100.
2026-04-14 09:22:24 +03:00
Razvan Dimescu
d3f046da4c style: assert loopback addr in subdomain test, trim verbose comment 2026-04-13 08:10:26 +03:00
Razvan Dimescu
0bdde40f40 test: verify forwarded response content from mock upstream 2026-04-13 08:07:58 +03:00
Razvan Dimescu
155c1c4da0 test: full-pipeline coverage for every resolve_query step
Test each pipeline stage in isolation through resolve_query:
- override takes precedence over all other paths
- localhost and *.localhost resolve to loopback
- local zone returns configured records
- .tld proxy resolves registered services to loopback
- blocklist sinkholes to 0.0.0.0
- cache hit returns stored response without upstream
2026-04-13 08:04:59 +03:00
Razvan Dimescu
b40004fe5e refactor: extract shared test infrastructure into testutil module
- test_ctx(): single ServerCtx builder, replaces 3 copies (ctx/api/dot)
- mock_upstream(): canned DNS response server for forwarding tests
- blackhole_upstream(): unresponsive socket for timeout tests
- Removes ~100 lines of duplicated 30-field struct literals
2026-04-13 07:56:47 +03:00
Razvan Dimescu
b8ddc16027 refactor: return QueryPath from resolve_query, add mock upstream to tests
resolve_query now returns (BytePacketBuffer, QueryPath) so callers
and tests can inspect the resolution path without reading the query
log. Production call sites (UDP, DoT, DoH) destructure and ignore it.

The forwarding test now uses a mock UDP upstream that replies with a
canned response, asserting QueryPath::Forwarded instead of != Local.
2026-04-13 07:51:14 +03:00
Razvan Dimescu
48f67be2f1 refactor: deduplicate test_ctx by delegating to test_ctx_with_forwarding 2026-04-13 07:39:55 +03:00
Razvan Dimescu
ca00846393 fix: forwarding rules override special-use NXDOMAIN for private PTR zones
Explicit [[forwarding]] rules now take precedence over the RFC 6303
special-use domain intercept. Previously, PTR queries for private
ranges (e.g. 168.192.in-addr.arpa) always returned local NXDOMAIN
even when a forwarding rule pointed them at a corporate DNS server.

Add full-pipeline resolve_query test harness (test_ctx + resolve_in_test)
and two tests covering both the default behavior and the override.

Closes #94
2026-04-13 07:36:53 +03:00
Razvan Dimescu
305935ed98 style: rustfmt strip_port 2026-04-12 23:59:51 +03:00
Razvan Dimescu
bd505813b6 test: verify TLS cert SANs (wildcard, services, loopback, localhost, bare TLD)
Parse the generated DER cert with x509-parser to assert the exact SAN
set, catching silent try_into() failures that a params-level test
would miss.
2026-04-12 23:54:55 +03:00
Razvan Dimescu
115a55b199 fix: bracketed IPv6, localhost SAN, split host-check helpers
- is_doh_host split into strip_port + is_loopback_host + is_tld_match
- strip_port handles bracketed IPv6 ([::1]:443) and rejects bare IPv6
- Add [::1] to accepted loopback hosts, add localhost DNS SAN to cert
- Remove dead sans.is_empty() guard (loopback IPs always present)
2026-04-12 23:54:26 +03:00
Razvan Dimescu
3665deb56b fix: accept loopback addresses for DoH and add IP SANs to TLS cert
The DoH endpoint rejected requests with Host: 127.0.0.1/::1/localhost,
and the generated TLS cert had no IP SANs — so browsers couldn't use
https://127.0.0.1/dns-query even with the CA trusted.

- is_doh_host now accepts 127.0.0.1, ::1, localhost (with optional port)
- TLS cert includes 127.0.0.1 and ::1 IP SANs, plus bare TLD DNS SAN

Closes #87
2026-04-12 23:54:26 +03:00
Razvan Dimescu
2101dfcf17 feat: transport protocol tracking (UDP/TCP/DoT/DoH) with dashboard visualization
Thread Transport enum through resolve pipeline, record per-query
transport in stats and query log. Dashboard gets bar chart panel
with encryption %, transport column in query log, and filter dropdown.
2026-04-12 22:14:26 +03:00
Razvan Dimescu
02e1449a45 feat: enable request hedging for all upstream protocols
Hedging was DoH-only (hyper dispatch spike mitigation). Now applies to
UDP (rescues packet loss) and DoT (rescues TLS handshake stalls) too.
Same-upstream hedging: fires a second independent request after hedge_ms
delay. First response wins. Disable with hedge_ms = 0.
2026-04-12 21:34:47 +03:00
Razvan Dimescu
6d9ee14ea6 refactor: unify warm_stale/warm_domain, remove raw_wire alloc, add Freshness enum
- Extract refresh_entry in ctx.rs — warm_domain in main.rs now delegates
  to it instead of duplicating the resolve+cache logic (~40 lines removed)
- Eliminate unconditional .to_vec() of raw wire on every UDP/DoT query —
  pass &buffer.buf[..len] directly (zero-cost for cache hits)
- Replace bare bool stale flag with Freshness enum (Fresh/NearExpiry/Stale)
  making the three states self-documenting at every call site
2026-04-12 19:56:42 +03:00
Razvan Dimescu
3c49b0e65d fix: deduplicate background refresh with per-domain guard
Multiple stale queries for the same domain now spawn only one background
refresh. A HashSet<(String, QueryType)> on ServerCtx tracks in-flight
refreshes; subsequent stale hits for the same key skip the spawn.
2026-04-12 19:49:23 +03:00
Razvan Dimescu
8ef95383a2 feat: prefetch at <10% TTL remaining, add stale behavior tests
Entries with <10% TTL remaining are now marked stale on lookup,
triggering a background refresh before they expire. Combined with
the serve-stale + background refresh from the previous commit, this
means entries are proactively refreshed — matching Unbound's prefetch
behavior.
2026-04-12 19:46:14 +03:00
Razvan Dimescu
571ce2f013 feat: background refresh on stale cache hit (RFC 8767 revalidation)
When a cached entry is expired but within the 1-hour stale window,
serve it immediately with TTL=1 AND spawn a background re-resolve.
The next query gets a fresh entry instead of another stale serve.

Without this, stale entries were served repeatedly for up to an hour
with no refresh — effectively ignoring TTL.
2026-04-12 19:42:56 +03:00
Razvan Dimescu
043a7e1ba5 feat: raise cache default to 100K entries, evict stalest instead of dropping
The 10K cap was too conservative — the blocklist alone holds 400K domains.
At ~100 bytes per wire entry, 100K entries is ~10MB.

When the cache is full and evict_expired doesn't free enough slots,
evict_stalest removes the entry with the least remaining TTL instead of
silently discarding the new insert.
2026-04-12 19:23:28 +03:00
Razvan Dimescu
05d5a5145f refactor: remove unused extract_question and read_wire_qname from wire.rs 2026-04-12 18:46:03 +03:00
Razvan Dimescu
628ed00074 refactor: extract cache_and_parse, remove dead truncation log, restore TCP_TIMEOUT to 400ms 2026-04-12 18:40:46 +03:00
Razvan Dimescu
85cff052a4 fix: restore TCP_TIMEOUT to 400ms (test race was the real issue) 2026-04-12 18:40:46 +03:00
Razvan Dimescu
67b472fea7 fix: serialize tests that share global UDP_DISABLED state
The tcp_only_iterative_resolution, tcp_fallback_resolves_when_udp_blocked,
tcp_fallback_handles_nxdomain, and udp_auto_disable_resets tests all mutate
global UDP_DISABLED / UDP_FAILURES atomics. Under cargo test parallelism,
udp_auto_disable_resets would reset the flag mid-flight causing other tests
to attempt UDP against TCP-only mock servers and time out.

Fix: static Mutex serializes tests that depend on global UDP state.
Also: tcp_only_iterative_resolution now calls forward_tcp directly,
removing its dependence on the flag entirely.
2026-04-12 18:40:46 +03:00
Razvan Dimescu
700cca9cb6 style: rustfmt warm_domain 2026-04-12 18:40:46 +03:00
Razvan Dimescu
f705f8c49f fix: bump TCP_TIMEOUT to 800ms to fix flaky CI test 2026-04-12 18:40:46 +03:00
Razvan Dimescu
17a1a6ddba refactor: remove forward_with_failover duplication, fix warm-branch hedge bug
- Remove forward_with_failover (parsed): warm_domain now uses _raw + insert_wire
- forward_udp delegates to forward_udp_raw (single UDP socket implementation)
- forward_query uses unified _raw path for all protocols
- Fix send_query_hedged warm branch: bare select! dropped secondary on primary
  error instead of waiting for it — now drains both futures like the cold branch
- Remove pointless raw_len = len rename
2026-04-12 18:40:46 +03:00
Razvan Dimescu
72b540a44a feat: wire-level cache, serve-stale, raw wire passthrough
- Cache stores raw DNS wire bytes + TTL offsets (2.4x memory reduction)
- Serve-stale (RFC 8767): expired entries returned with TTL=1 for 1hr
- handle_query captures raw_len from recv_from for zero-copy forwarding
- resolve_query accepts raw wire bytes, forwards without re-serializing
- wire.rs: TTL offset scanner, ID/TTL patching, question extraction
- 52 wire tests + 16 cache regression tests
2026-04-12 18:40:46 +03:00
Razvan Dimescu
5d9a3a809b feat: DoT client, recursive optimization, bench refactor
- Add DoT forwarding client (tls://IP#hostname upstream config)
- Recursive: cache NS delegations, serve-stale (RFC 8767), parallel
  NS queries on cold, no TCP fallback on individual UDP timeouts,
  400ms NS/TCP timeout (down from 800/1500ms)
- Reduce recursive p99 from 2367ms to 402ms (vs Unbound's 148ms)
- Refactor benchmark suite: generic compare_two engine, delete
  one-off diagnostics (1969 → 750 lines)
- Code cleanup: forward_query delegates to _raw, Option<String>
  for tls_name, saturating_sub for ns_idx
2026-04-12 18:40:46 +03:00
Razvan Dimescu
7efac85836 feat: wire-level forwarding, cache, request hedging, and DoH keepalive
Wire-level forwarding path skips DnsPacket parse/serialize on the hot
path. Cache stores raw wire bytes with pre-scanned TTL offsets — patches
ID + TTLs in-place on lookup instead of cloning parsed packets.

Request hedging (Dean & Barroso "Tail at Scale") fires a second
parallel request after a configurable delay (default 10ms) when
the primary upstream stalls. DoH keepalive loop prevents idle
HTTP/2 + TLS connection teardown.

Recursive resolver now hedges across multiple NS addresses and
caches NS delegation records to skip TLD re-queries.

Integration test harness polls /blocking/stats instead of fixed
sleep, eliminating the blocklist-download race condition.
2026-04-12 18:39:48 +03:00
Razvan Dimescu
05baad0cc0 feat: DoT (DNS over TLS) client upstream
Adds tls:// upstream support for forwarding queries over DNS-over-TLS
(RFC 7858). Parses tls://IP:PORT#hostname format, with default port 853.

- New Upstream::Dot variant with TLS connector
- forward_dot: length-prefixed DNS over TLS stream
- build_dot_connector: system root CAs via webpki-roots
- parse_upstream handles tls:// prefix

Example config:
  address = ["tls://9.9.9.9#dns.quad9.net"]
2026-04-12 18:35:06 +03:00
Razvan Dimescu
7047767dc2 feat: per-suffix conditional forwarding rules (#82) (#84)
* feat: per-suffix conditional forwarding rules in numa.toml (#82)

Adds a `[[forwarding]]` config section so users can explicitly route
domain suffixes to specific upstreams. Config-declared rules take
precedence over auto-discovered rules (macOS scutil, Linux search
domains) via first-match semantics.

Example — the reporter's reverse-DNS case:

  [[forwarding]]
  suffix = "168.192.in-addr.arpa"
  upstream = "100.90.1.63:5361"

Bare IPs default to port 53. IPv6 is supported via
parse_upstream_addr. ForwardingRule::new() constructor replaces
direct struct-literal construction, and make_rule() now delegates
to parse_upstream_addr to fix a latent IPv6 parsing bug.

* feat: accept suffix as string or array in [[forwarding]] rules

Reuses existing string_or_vec deserializer so users can write:
  suffix = ["168.192.in-addr.arpa", "onsite"]
instead of repeating [[forwarding]] blocks per suffix.

* style: rustfmt

* refactor: drop config_count from merge_forwarding_rules return

Log config rules directly from config.forwarding before merging,
keeping the merge API clean of logging concerns.
2026-04-12 06:12:08 +03:00
Razvan Dimescu
22bebb85a0 fix: config path advisory ignores XDG file on interactive root (#81) (#83)
Port-53 and TLS-data-dir advisories told users to create
~/.config/numa/numa.toml, but config_dir() routed root to
/var/lib/numa/ and load_config never consulted the XDG path, so
the file the user created was silently ignored.

New suggested_config_path() helper prefers $HOME/.config/numa/
when HOME is set (and isn't "/" or empty), with config_dir() as
lazy fallback. Used by both advisories and by load_config as an
additional candidate, so the advised path is the path numa
actually reads. Runtime state (services.json, TLS CA) stays in
FHS — config_dir()/data_dir() are intentionally unchanged to
keep continuity with the installed daemon.

End-to-end replication + regression check in
tests/docker/issue-81.sh: four scenarios (replication and
existing-install, each against main and fix), all matching
expectations.
2026-04-12 02:17:33 +03:00
Razvan Dimescu
7d6b0ed568 feat: DoH server endpoint + DoT enabled by default (#79)
* chore: document multi-forwarder and cache warming in config and README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: DNS-over-HTTPS server endpoint (RFC 8484)

Serve DoH at POST /dns-query on the existing HTTPS proxy (port 443).
Automatically enabled when proxy TLS is active — no config needed.
Also fix zone map priority so local zones override RFC 6762 .local
special-use handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove GoatCounter analytics from site

GoatCounter domains (goatcounter.com, gc.zgo.at) are blocked by
Hagezi Pro, which is Numa's default blocklist. A DNS privacy tool
should not embed analytics that its own resolver blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: enable DoT listener by default

DoT now starts automatically with `sudo numa`, matching the proxy and
DoH which are already on by default. The self-signed CA infrastructure
is shared with the proxy, so there is no additional setup. This makes
`numa setup-phone` work out of the box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 04:06:17 +03:00
Razvan Dimescu
7770129589 feat: cache warming — proactive DNS resolution for configured domains (#78)
Resolves A + AAAA at startup for domains listed in [cache] warm,
then re-resolves before TTL expiry (at 75% elapsed). Keeps critical
domains always hot in cache with zero client-visible latency.

Closes #34 (item 4)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 01:14:04 +03:00
Razvan Dimescu
8abcd91f95 feat: multi-forwarder with SRTT-based failover (#77)
* feat: multi-forwarder with SRTT-based failover

address accepts string or array, with optional per-server port override.
New fallback pool tried only when all primaries fail. Sequential failover
with SRTT ranking ensures fastest upstream is tried first.

Closes #34 (items 1, 2, 3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: simplify failover candidate list and deduplicate recursive pool

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract maybe_update_primary for testable upstream re-detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: rustfmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:26:58 +03:00
Razvan Dimescu
2c20c56421 feat: mobile setup — QR onboarding, Wi-Fi scoped mobileconfig (#73)
* fix: scope mobileconfig DNS to Wi-Fi only via OnDemandRules

Without OnDemandRules, iOS applies the DoT profile globally —
cellular DNS breaks when the phone leaves the LAN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: phone setup QR code in dashboard header

- Add /qr endpoint serving SVG (uses existing qrcode crate, svg feature)
- Header popover: QR on desktop, direct download link on mobile viewports
- Only visible when [mobile] enabled = true in config
- Expose mobile.enabled and mobile.port in /stats response
- Lazy-load QR on first click, dismiss on outside click

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add Cache-Control to /qr, re-fetch QR on each popover open

Cache-Control: no-store prevents stale QR after LAN IP change.
Remove qrLoaded flag so the QR always reflects the current IP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: rustfmt serve_qr response tuple

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add iOS install steps to phone setup popover

iOS shows "Profile Downloaded" with no guidance. The popover
now includes the 3-step install flow including the buried
Certificate Trust Settings toggle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 22:21:51 +03:00
Razvan Dimescu
921ed68d54 fix: allowlist parent domain unblocks subdomains (#74)
* fix: allowlist parent domain unblocks subdomains in blocklist

The allowlist walk-up was interleaved with the blocklist walk-up,
so an exact blocklist match on www.example.com short-circuited
before reaching example.com in the allowlist. Now allowlist is
checked at all parent levels before consulting the blocklist.

Deduplicate is_blocked/check via find_in_set helper; is_blocked
delegates to check. Adds 7 new blocklist tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: rustfmt blocklist tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: zero-alloc is_blocked hot path, normalize trailing dots

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: rustfmt add_to_allowlist

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract normalize() for domain lowering + dot stripping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:43:40 +03:00
Razvan Dimescu
de15b32325 feat: numa setup-phone — QR-based mobile DoT onboarding (#38)
* feat: numa setup-phone — QR-based mobile DoT onboarding

Adds a CLI subcommand that generates a one-time mobileconfig profile
containing both the Numa local CA (as a com.apple.security.root payload)
and the DoT DNS settings, then serves it via a temporary HTTP server
and prints a scannable QR code in the terminal.

Flow:
  1. User runs `numa setup-phone` (no sudo needed)
  2. Detects current LAN IP, reads CA from /usr/local/var/numa/ca.pem
  3. Builds combined mobileconfig (CA trust + DoT)
  4. Renders QR code with qrcode crate (Unicode block characters)
  5. Serves the profile on port 8765, stays open until Ctrl+C
  6. Counts successful downloads (multi-device households)

Important caveat documented in instructions: even with the CA bundled
in the profile, iOS still requires the user to manually enable trust
in Settings → General → About → Certificate Trust Settings. Verified
on a real iPhone.

Stable PayloadIdentifiers/UUIDs ensure re-running replaces the
existing profile on iOS rather than accumulating duplicates.

- New module: src/setup_phone.rs (~270 lines)
- New CLI subcommand: `numa setup-phone`
- New dependency: qrcode = "0.14" (default-features = false)
- tokio "signal" feature added for Ctrl+C handling
- 3 unit tests: PEM stripping, mobileconfig generation, QR rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mobile API, enriched /health, mobileconfig module

Adds a persistent read-only HTTP listener (default port 8765, LAN-bound)
serving a dedicated subset of Numa's API for iOS/Android companion apps
and as a replacement for the one-shot server setup_phone used to spin up:

  GET /health           — enriched JSON with version, hostname, LAN IP,
                          SNI, DoT config, mobile API port, CA
                          fingerprint, features (shared handler with
                          the main API on port 5380)
  GET /ca.pem           — public CA certificate (shared handler)
  GET /mobileconfig     — full iOS profile (CA trust + DNS settings
                          pinned to current LAN IP)
  GET /ca.mobileconfig  — CA-only iOS profile (trust anchor without
                          DNS override — for the iOS companion app's
                          programmatic DNS flow via NEDNSSettingsManager)

All routes are idempotent GETs. The mobile API never serves the
state-mutating routes that live on the main API (overrides, blocking
toggle, service CRUD, cache flush), so it is safe to expose on the LAN
regardless of the main API's bind address. The CA private key is never
served by any route.

Opt-in via `[mobile] enabled = true`. Default is false so new installs
do not silently expose a LAN listener after upgrading; our committed
numa.toml template enables it explicitly for spike testing.

New modules:

- src/mobileconfig.rs — ProfileMode::{Full, CaOnly} enum with plist
  builder lifted from setup_phone.rs. Full and CaOnly share the CA
  payload UUID (same trust anchor) but have distinct top-level UUIDs
  so they coexist as separate installable profiles on iOS.

- src/health.rs — HealthMeta cached metadata built once at startup
  from config + CA fingerprint (SHA-256 of the PEM via ring), and the
  HealthResponse JSON shape shared between the main and mobile APIs.

- src/mobile_api.rs — axum Router for the persistent listener. Reuses
  api::health and api::serve_ca from the main API; owns the two
  mobileconfig handlers.

Modified:

- src/api.rs — health() returns the enriched HealthResponse, now pub.
  serve_ca is now pub so mobile_api can reuse it.
- src/config.rs — MobileConfig section (enabled, port, bind_addr).
- src/ctx.rs — health_meta: HealthMeta field on ServerCtx.
- src/main.rs — builds HealthMeta at startup, spawns mobile API
  listener if enabled.
- src/lan.rs — build_announcement takes &HealthMeta and writes
  enriched TXT records (version, api_port, proto, dot_port, ca_fp).
  SRV port now reports the mobile API port; peer discovery still
  reads TXT `services=` so this is backwards compatible. Always
  announces even when no .numa services are registered, so the iOS
  companion app can discover Numa via mDNS regardless of service
  state.
- src/setup_phone.rs — reduced from 267 to 100 lines. The CLI is now
  a thin QR wrapper over the persistent /mobileconfig endpoint; the
  hand-rolled one-shot HTTP server (accept_loop, RUST_OK_HEADERS,
  RUST_NOT_FOUND, download counter) is gone.
- src/dot.rs — test fixture updated with HealthMeta::test_fixture().
- numa.toml — commented [mobile] section, enabled = true for spike.

Tests: 136 unit tests passing (5 new in mobileconfig, 3 new in health).
cargo clippy clean. Integration sanity check: curl'd /health, /ca.pem,
/mobileconfig, /ca.mobileconfig against a running numa — all return
200 with correct content types and valid response bodies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: setup-phone probe, unknown command error, query source in dashboard

- setup-phone now probes the mobile API before printing the QR code
  and shows an actionable error if [mobile] is not enabled
- Unknown CLI subcommands print an error instead of silently
  attempting to start a full server
- Dashboard query log shows source IP under timestamp (localhost
  for loopback, full IP for LAN devices) with full addr on hover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:08:56 +03:00
Razvan Dimescu
e860731c01 fix: escape DNS label text per RFC 1035 §5.1 (closes #36) (#54)
* fix: escape dots and special characters in DNS label text per RFC 1035 §5.1

Closes #36

read_qname was pushing raw label bytes directly into the output string,
producing ambiguous text for labels containing dots, backslashes, or
non-printable bytes. fanf2 spotted this on HN: wire format
`[8]exa.mple[3]com[0]` (two labels, first containing a literal 0x2E)
was rendered as `exa.mple.com`, indistinguishable from three labels.

Fix both sides of the text representation per RFC 1035 §5.1:

read_qname — when rendering wire bytes to text:
- literal `.` within a label → `\.`
- literal `\` → `\\`
- bytes outside 0x21..=0x7E → `\DDD` (3-digit decimal)
- printable ASCII passes through unchanged

write_qname — when parsing text back to wire:
- `\.` produces a literal 0x2E inside the current label (not a separator)
- `\\` produces a literal 0x5C
- `\DDD` produces the byte with that decimal value (0..=255)
- unescaped `.` still separates labels, empty labels still skipped
- rejects trailing `\`, short `\DD`, and `\DDD` > 255

Impact in practice is low — real-world domains don't contain dots in
labels — but it's a correctness bug in the wire format parser that
could cause round-trip failures with adversarial input. The parser is
the core of the project, so correctness bugs take priority over
practical impact.

Adds 16 unit tests in a new `#[cfg(test)] mod tests` block covering:
plain domain read/write, literal-dot escaping on both sides, backslash
escaping, non-printable + space decimal escapes, full round-trip
preservation, and the three rejection cases for malformed escapes.

Credit: fanf2 (https://news.ycombinator.com/item?id=47612321)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: stream label writes directly into buffer (review feedback)

The first cut of this fix delegated write_qname to a helper
(parse_escaped_labels) that built Vec<Vec<u8>> up-front, then iterated
to emit the wire bytes. On a plain-ASCII domain like "www.google.com"
that's ~4 heap allocations per write_qname call, and record.rs calls
write_qname ~6 times per response — so this PR would regress
bench_buffer_serialize by roughly 24 extra allocations per response
vs. main, where the old non-escaping code had zero.

Rewrite write_qname as a streaming byte-level loop that reserves the
length byte up-front, writes the label body directly into the buffer,
then backpatches the length via set(). Zero intermediate allocations
on the common path, and the 63-byte label cap is now checked
incrementally so oversized labels fail fast.

Byte-level scanning is safe for UTF-8 input: continuation bytes are
always in 0x80..=0xBF, so they can never collide with the ASCII `.`
(0x2E) or `\` (0x5C) that drive label splitting and escape parsing.

Also inline the \DDD rendering in read_qname to avoid the per-byte
format!() allocation on non-printable input. Plain-ASCII reads hit
the unchanged push(c as char) fast path, so the common case has zero
regression.

The parse_escaped_labels helper is deleted — no remaining callers.

All 158 tests pass, clippy + fmt clean. Collapses three review
findings (HIGH allocation regression, MEDIUM format! allocation,
MEDIUM .unwrap() after digit guard) in one pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: route dnssec::name_to_wire through write_qname for escape handling

Closes #55.

dnssec::name_to_wire was a parallel implementation of the old
write_qname's splitting loop — it iterated qname.split('.') and pushed
raw bytes. It predated and duplicated the buffer.rs logic, and it did
not understand RFC 1035 §5.1 text escapes. After the read_qname fix in
this PR, names that come out of read_qname can contain \., \\, or
\DDD sequences; feeding those back into the old name_to_wire would
split on the literal '.' inside a \. sequence and produce corrupt
RRSIG signed-data blobs.

The underlying bug predates this PR — the old read_qname was broken
too, so both sides of the DNSSEC canonical form pipeline were
silently wrong in the same way. Making read_qname correct exposed the
divergence, so it's fixed here in the same PR that introduced the
exposure.

Reimplement name_to_wire on top of BytePacketBuffer::write_qname:
reserve a scratch buffer, let write_qname handle the escape parsing
and length-byte framing, copy the emitted bytes into a Vec, then
walk the wire once more to lowercase label bodies (length bytes stay
untouched). Canonical form per RFC 4034 §6.2 requires the
lowercasing, and it has to happen post-escape-resolution — a
decimal escape like \065 produces 0x41 ('A'), which must be
lowercased to 'a' in the final wire bytes.

Call sites in build_signed_data, record_to_canonical_wire,
record_rdata_canonical, and nsec3_hash are unchanged — the public
signature stays the same, infallible Vec<u8> return.

Tests:
- name_to_wire_escaped_dot_in_label_is_not_a_separator — verifies
  the fanf2 example round-trips correctly through canonical form
- name_to_wire_decimal_escape_is_lowercased — verifies post-escape
  lowercasing (the subtle correctness requirement)
- existing name_to_wire_root, name_to_wire_domain, ds_verification
  tests still pass unchanged

Test count: 158 → 160.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: tighten name_to_wire per review feedback

- Replace hand-rolled per-byte lowercase loop with stdlib
  [u8]::make_ascii_lowercase(). Shorter and idiomatic.
- Tighten the .expect() message to state the actual invariant
  (parseable DNS name) instead of vague "well-formed" language.
- Replace the doc comment's "see #55" with the real invariant —
  issue numbers rot, and by merge time #55 is closed anyway. The
  comment now explains WHY the lowercase pass has to happen
  post-escape-resolution (\065 → 'A' → 'a') instead of during
  write_qname.
- Drop the redundant `\065` test comment (the one-liner version
  is enough with the assertion showing the transform).

No behavior change; 160 tests still pass, clippy + fmt clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: cover label cap and empty-label rollback; trim doc comments

Closes coverage gaps left by PR #54:

- write_rejects_label_over_63_bytes: pins the incremental 63-byte cap
  inside write_qname's inner loop (boundary at 63 vs 64).
- write_skips_empty_labels: pins the rollback branch (pos = len_pos)
  triggered by leading or consecutive dots.

Doc comments tightened:

- write_qname: drop the streaming-impl walkthrough and the escape-grammar
  restatement (already documented on read_qname).
- name_to_wire: drop the implementation explanation; keep the
  post-escape lowercasing rationale, which pins behavior a future
  refactor could silently break.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:53:46 +03:00
Razvan Dimescu
f556b60ce4 fix: suppress recursive hint in install when already configured (#71)
`sudo numa install` unconditionally printed the "Want full DNS
sovereignty?" hint even when numa.toml already has mode = "recursive".
Now loads the config first and skips the message if recursive is
already set.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 08:32:51 +03:00
Razvan Dimescu
422726f1c8 chore(deps): bump rcgen from 0.13 to 0.14 (#70)
rcgen 0.14 replaced the separate Certificate + KeyPair args with a
unified Issuer type. Migrates ensure_ca and generate_service_cert:

- Load path: Issuer::from_ca_cert_der replaces the old
  CertificateParams::from_ca_cert_pem + self_signed round-trip.
- Generate path: Issuer::new(params, key_pair) constructs directly
  from the params used for self_signed (no DER re-parse).
- signed_by takes (&key_pair, &issuer) instead of (&key_pair, &cert, &key).

Also drops thiserror v1 from the dep tree (rcgen 0.14 uses v2).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 08:28:07 +03:00
Razvan Dimescu
643d6b01e1 fix(linux): consult resolvectl when resolv.conf only shows the stub (#52)
On modern Arch / Ubuntu 22.04+ / Fedora desktops, NetworkManager +
systemd-resolved symlink /etc/resolv.conf to stub-resolv.conf, which
contains only:

  nameserver 127.0.0.53

The real upstream servers (router, ISP, configured DoT providers) live
inside systemd-resolved's per-link state, exposed via 'resolvectl status'.

discover_linux() was parsing /etc/resolv.conf, correctly filtering the
stub address, and then falling through to the Quad9 DoH fallback because
detect_dhcp_dns() is macOS-only on Linux. Net effect: on a large chunk of
Linux installs, numa silently defaulted to Quad9 instead of the user's
actual DNS — visible in Casey's AUR test banner (#33) as
'Upstream https://9.9.9.9/dns-query' despite his machine having working
router DNS the entire time.

resolvectl_dns_server() already exists — it was introduced for cloud VPC
forwarding-rule discovery and knows how to ask systemd-resolved for the
real active DNS server. This commit wires it into the default-upstream
fallback chain, between the primary resolv.conf parse and the
~/.numa/original-resolv.conf backup.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 22:32:57 +03:00
Razvan Dimescu
fab8b698d8 fix: human-readable advisories for TLS data_dir + port-53 EACCES (#48)
* fix: human-readable advisory when TLS data_dir is not writable

When numa runs as non-root on a system with a privileged default
data_dir (e.g. /usr/local/var/numa on macOS), TLS CA setup fails with
a raw "Permission denied (os error 13)" and HTTPS proxy is silently
disabled. The user sees a cryptic warning with no path forward.

Detect std::io::ErrorKind::PermissionDenied on the tls error, print a
diagnostic naming the data_dir and offering two fixes (install as
system resolver, or point data_dir at a writable path), and keep the
graceful-degradation behavior — DNS resolution and plain-HTTP proxy
continue to work without HTTPS.

All other TLS setup errors fall through to the existing log::warn!.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: port-53 advisory also handles EACCES (non-root privileged bind)

The original port-53 match arm only caught EADDRINUSE, so a fresh
non-root user on macOS/Linux hitting EACCES when trying to bind a
privileged port saw the raw OS error instead of the advisory.

Collapse the scoping helper and the advisory into a single
`try_port53_advisory(bind_addr, &io::Error) -> Option<String>` that
returns the formatted diagnostic when both the port is 53 and the
error kind is one we can speak to (AddrInUse or PermissionDenied),
and `None` otherwise. The two failure modes share one body with a
cause-sentence variant — no duplicated fix text.

Caller becomes a plain if-let: no match guard, no separate is_port_53
helper exposed on the public API. is_port_53 goes back to private.

Unit tests cover all branches: AddrInUse, PermissionDenied, non-53
bind_addr, unrelated ErrorKind, and malformed bind_addr.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: move TLS error classification into tls module

main.rs no longer downcasts a boxed error to figure out whether it's
a permission-denied case. tls::try_data_dir_advisory(&err, &dir)
encapsulates the downcast + kind match and returns Some(advisory) or
None, mirroring system_dns::try_port53_advisory. main.rs becomes a
plain if-let, symmetric with the port-53 path.

Trim the docstrings on both advisory functions: they were narrating
the implementation (errno mapping) instead of stating the contract.

Add unit tests for try_data_dir_advisory covering PermissionDenied,
other io::ErrorKind, and non-io errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 16:27:08 +03:00
Razvan Dimescu
a6f23a5ddb fix: advisory + exit(1) when port 53 is already in use (#45) (#47)
* fix: advisory + exit(1) when port 53 is already in use (#45)

Detect AddrInUse on bind, print a human-readable diagnostic explaining
systemd-resolved / Dnscache as the likely cause and offer two concrete
fixes (sudo numa install, or bind_addr on a non-privileged port), then
exit(1) instead of surfacing a raw OS error.

Adds tests/docker/smoke-port53.sh: end-to-end Docker test that
pre-binds port 53 with a Python UDP socket and asserts the advisory +
exit code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: collapse port53 advisory to single flat path

The per-platform cause sentences were cosmetic — they didn't change
the user's actions (install, or bind_addr on a non-privileged port),
but they introduced duplicated "another process..." strings, a
dead-from-CI branch (is_systemd_resolved_active() == true is never
reached by any test), and a pub visibility bump on
is_systemd_resolved_active for a single caller.

Replace with one flat format! whose cause line mentions both
systemd-resolved and the Windows DNS Client inline. The existing
smoke test now exercises 100% of the function.

is_systemd_resolved_active reverts to private.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 15:03:58 +03:00
Razvan Dimescu
79ecb73d87 fix: use FHS-compliant /var/lib/numa as Linux data dir default (#43)
* fix: use FHS-compliant /var/lib/numa as Linux data dir default

numa's default system-wide data directory was hardcoded to
/usr/local/var/numa for all Unix platforms. This is the right path on
macOS (Homebrew prefix convention) but non-FHS on Linux, where Arch /
Fedora / Debian / etc. expect persistent state under /var/lib/<pkg>.
The mismatch was invisible to existing users (numa creates the dir
silently on first run) but immediately surfaces when packaging for a
distro — see PR #33 (community contribution to add an Arch AUR package)
which had to add fragile sed-based path patching at PKGBUILD build time.

The fix moves the path decision into a small helper:

  - daemon_data_dir()        — cfg-gated platform dispatch (linux/macos)
  - resolve_linux_data_dir() — pure function, takes "does X exist?"
                               as parameters, returns the right path

Linux behavior:
  - Fresh install                       → /var/lib/numa (FHS)
  - Upgrading from pre-v0.10.1 install  → /usr/local/var/numa (legacy)
  - Both paths exist                    → /var/lib/numa (FHS wins)

The legacy fallback is critical: existing v0.10.0 Linux users have
their CA cert + services.json under /usr/local/var/numa. Returning
the new path unconditionally would cause CA regeneration on upgrade,
breaking every browser that had trusted the previous CA. The fallback
is checked at startup via std::path::Path::exists, so the upgrade is
seamless and zero-config.

macOS behavior is unchanged — /usr/local/var/numa is still correct
because Homebrew's prefix is /usr/local.

Test coverage:

  - resolve_linux_data_dir is a pure function gated cfg(any(linux,test))
    so the same code path is unit-tested on every platform's CI run.
  - Four tests cover all combinations of (legacy_exists, fhs_exists),
    asserting the migration logic stays correct under future edits.

The default config in numa.toml is also updated to document the new
per-platform default paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: end-to-end FHS path verification + simplify cleanup

Two related changes from a /simplify pass and a follow-up testing
finalization:

1. lib.rs cleanup (no behavior change):
   - Drop FHS_LINUX_DATA_DIR and LEGACY_LINUX_DATA_DIR consts. Both
     were used in only 4 places total and the unit tests already
     bypassed them with string literals, so they were over-engineering.
     Inline the strings in daemon_data_dir() and resolve_linux_data_dir().
   - Trim narrating doc/comments on the helper and the test bodies.
     Keep only the non-obvious WHY (the macOS Homebrew note and the
     migration-keeps-legacy rationale).

2. tests/docker/smoke-arch.sh:
   - Cherry-picked the previously-uncommitted Arch compatibility smoke
     test from feat/smoke-arch.
   - Removed the [server] data_dir = "/tmp/numa-smoke" override from
     the test config so the script now exercises the DEFAULT data dir
     code path — which is exactly what the FHS fix touches.
   - Added a path assertion after the dig succeeds: verify that
     /var/lib/numa/ca.pem exists (FHS) and /usr/local/var/numa is
     absent (no accidental dual-creation on a fresh install).

Verified end-to-end on archlinux:latest (Apple Silicon, Rosetta):

  ── building + running numa on archlinux:latest ──
  ── cargo build --release --locked ──
      Finished `release` profile [optimized] target(s) in 24.02s
  ── dig @127.0.0.1 -p 5354 google.com A ──
    142.251.38.206
  ── FHS path check ──
    ✓ CA cert at /var/lib/numa/ca.pem (FHS path)
    ✓ legacy path /usr/local/var/numa absent (fresh install used FHS)
  ── smoke-arch passed ──

This closes the testing gap where the unit tests covered the
path-decision LOGIC in isolation but nothing exercised the live
wiring on a real Linux filesystem.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:00:27 +03:00
Razvan Dimescu
bf5565ac26 fix: macOS use launchctl bootout/bootstrap instead of deprecated load (#42)
The deprecated `launchctl load -w` returns exit code 0 even when it
cannot actually reload a service whose label is already in launchd's
in-memory state. It prints `Load failed: 5: Input/output error` to
stderr but exits 0, so the install path interprets it as success and
continues — silently leaving the running daemon on whatever binary
was first loaded, even though the on-disk plist now points elsewhere.

The consequence: every macOS user running `brew upgrade numa` rewrites
the plist to point at the new binary, but launchctl never actually
loads it. They think they upgraded; they're still running the old
version. Neither #41 (cross-platform CA trust) nor #40 (self-referential
backup) would actually take effect for them until they manually run:

  sudo launchctl bootout system /Library/LaunchDaemons/com.numa.dns.plist
  sudo launchctl bootstrap system /Library/LaunchDaemons/com.numa.dns.plist

The fix uses the modern API symmetrically across all three call sites:

- install_service_macos: bootout (best-effort cleanup, no-op on first
  install) → bootstrap → wait for readiness → configure DNS
- install_service_macos rollback path: bootout instead of `unload`
- uninstall_service_macos: bootout BEFORE remove_file (the modern API
  needs the plist file path as the specifier; doing it after remove
  would leave the service in memory until reboot)

No new tests — this is a shell-call substitution with no logic to
unit-test. Verified manually on macOS: `sudo numa install` no longer
prints `Load failed`, and the daemon is correctly running the binary
the plist points at.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:54:21 +03:00
Razvan Dimescu
679b346246 fix: prevent self-referential DNS backup on re-install (#40)
* fix: prevent self-referential DNS backup on re-install

The install flow previously captured current system DNS servers
verbatim into the backup file. If numa was already installed, current
DNS was 127.0.0.1, so the "backup" recorded 127.0.0.1 as the "original"
— making a subsequent uninstall a no-op self-reference.

Reproduced 2026-04-08 during v0.10.0 brew dogfood: after
`sudo numa uninstall; sudo /opt/homebrew/bin/numa install`,
`sudo numa uninstall` printed `restored DNS for "Wi-Fi" -> 127.0.0.1`
because the brew binary's install step had overwritten the backup with
the already-stub state.

Fix (all three platforms):
- macOS/Windows: if the existing backup already contains at least one
  non-loopback/non-stub upstream, preserve it as-is. If writing a fresh
  backup, filter loopback/stub addresses first so a capture from
  already-numa-managed state isn't self-referential.
- Linux (resolv.conf fallback path): detect numa-managed or all-loopback
  resolv.conf content and skip the file copy in that case; preserve an
  existing useful backup rather than overwriting it. systemd-resolved
  path is unaffected (uses a drop-in, no backup file).

Adds three unit tests for the predicates: macOS HashMap detection,
Windows interface filter, and resolv.conf parsing (real upstream,
self-referential, numa-marker, systemd stub, mixed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: share iter_nameservers helper and reuse resolv.conf content

Post-review simplifications on the stale-backup fix:

- Extract iter_nameservers(&str) helper used by both parse_resolv_conf
  and resolv_conf_has_real_upstream. Eliminates the duplicated
  line-by-line nameserver parsing (findings from reuse review).
- install_linux: reuse the already-read resolv.conf content via
  std::fs::write instead of a second read via std::fs::copy.
- install_macos / install_windows: flatten the conditional eprintln
  pattern — always print a blank line, conditionally print the save
  message. Equivalent output, less branching.

Net −12 lines. All 130 tests still pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: drop redundant trim before split_whitespace

CI caught `clippy::trim_split_whitespace` on Rust 1.94: `split_whitespace()`
already skips leading/trailing whitespace, so `.trim()` first is redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract load_backup helper

Remove duplicated read+deserialize boilerplate shared by install_macos
and install_windows. The two call sites each had an identical 4-line
chain of read_to_string().ok().and_then(serde_json::from_str).ok() —
collapse into a single generic helper load_backup<T>().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "refactor: extract load_backup helper"

This reverts commit a54fb99428.

* test: drop windows_backup_filters_loopback

The test inlined the 3-line filter block from install_windows rather
than calling a production helper, so it was testing stdlib Vec::retain
+ is_loopback_or_stub — both already covered elsewhere. Deleting it
removes a test that would silently pass even if install_windows stopped
filtering altogether.

The predicate logic for macOS-shaped backups stays covered by
macos_backup_real_upstream_detection (same inner Vec<String> type).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add windows_backup_filters_loopback unit test

The PR description mentioned this test but it was missing from the
diff, leaving backup_has_real_upstream_windows untested. Mirrors the
shape of macos_backup_real_upstream_detection: empty map → false,
all-loopback (127.0.0.1, ::1, 0.0.0.0) → false, one real entry
alongside loopback → true.

Also relax the cfg gate on backup_has_real_upstream_windows from
cfg(windows) to cfg(any(windows, test)) so the test compiles
cross-platform, matching how backup_has_real_upstream_macos and
the resolv_conf helpers are gated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:38:37 +03:00
Razvan Dimescu
039254280b fix: cross-platform CA trust (Arch/Fedora + Windows) (#41)
* fix: cross-platform CA trust (Arch/Fedora + Windows)

Closes #35.

trust_ca_linux now detects which trust store the distro ships and
runs the matching refresh command, instead of hardcoding Debian's
update-ca-certificates. Detection walks a const table in priority
order, picking the first whose anchor dir exists:

  - debian: /usr/local/share/ca-certificates  (update-ca-certificates)
  - pki:    /etc/pki/ca-trust/source/anchors  (update-ca-trust extract)
  - p11kit: /etc/ca-certificates/trust-source/anchors (trust extract-compat)

Falls back with a clear error listing every backend tried.

Adds Windows support via certutil -addstore Root / -delstore Root,
removing the silent CA-trust gap on numa install (previously the
service installed but the trust step quietly errored, leaving every
HTTPS .numa request throwing browser warnings).

Refactor: trust_ca and untrust_ca are now thin dispatchers calling
per-platform helpers. CA_COMMON_NAME and CA_FILE_NAME are centralized
in tls.rs and reused from system_dns.rs and api.rs. untrust_ca_linux
no longer pre-checks file existence (TOCTOU) and skips the refresh
when no file was actually removed.

Test: tests/docker/install-trust.sh runs the install/uninstall
contract against debian:stable, fedora:latest, and archlinux:latest
in containers, asserting the cert lands in (and is removed from)
the system bundle. All three pass locally.

README notes the Firefox/NSS limitation (separate trust store).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: rustfmt fixes for trust_ca_linux helpers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: macOS CA trust contract test (manual)

Adds tests/manual/install-trust-macos.sh — a sudo bash script that
mirrors trust_ca_macos / untrust_ca_macos against a fixture cert with
a unique CN. Designed to coexist with a running production numa:

- Refuses to run if a real "Numa Local CA" is already in System.keychain
  (fail-closed protection for dogfood installs)
- Uses a unique CN ("Numa Local CA Test <pid-timestamp>") so the test
  cert can never collide with production
- Mirrors the by-hash deletion loop from untrust_ca_macos
- Trap-cleanup on success or interrupt

Lives under tests/manual/ to signal "host-mutating, dev-only" — distinct
from tests/docker/install-trust.sh which is hermetic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: relax bail-out in macOS trust test (safe alongside production)

The bail-out was overly defensive. The test cert uses a unique CN
("Numa Local CA Test <pid-ts>") that is strictly longer than the
production CN, so `security find-certificate -c $TEST_CN` cannot
substring-match the production cert. All deletes are by-hash, which
can only target the test cert's specific hash. Coexistence is
provably safe; document the reasoning in the header comment block
and replace the refusal with an informational notice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:18:01 +03:00
Razvan Dimescu
6887c8e02e refactor: move data_dir override from env var to [server] TOML field
Reverts the NUMA_DATA_DIR env var added in the previous commit and
replaces it with a [server] data_dir TOML field. Numa already has a
well-developed config system; adding a parallel env-var mechanism
for a single knob was wrong.

The principle: TOML is for application behavior configuration. Env
vars are for bootstrap values (HOME, SUDO_USER to discover paths
before config loads) and standard ecosystem conventions (RUST_LOG).
data_dir is neither — it's an app knob, so it belongs in the TOML.

Changes:
- lib.rs::data_dir() reverts to the platform-specific fallback only
- config.rs adds `data_dir: Option<PathBuf>` to ServerConfig
- main.rs resolves config.server.data_dir with fallback to
  numa::data_dir() and passes it to build_tls_config, then stores the
  resolved path on ctx.data_dir for downstream consumers
- tls.rs::build_tls_config takes `data_dir: &Path` as an explicit
  parameter instead of calling crate::data_dir() behind the caller's
  back. regenerate_tls and dot.rs self_signed_tls now pass
  &ctx.data_dir, honoring whatever path the config resolved to
- tests/integration.sh Suite 6 uses `data_dir = "$NUMA_DATA"` in its
  test TOML instead of the NUMA_DATA_DIR env var prefix
- numa.toml gains a commented-out data_dir example

No behavior change for existing production deployments (the default
path is unchanged). Test harness is now fully config-driven, and
containerized deploys can override data_dir via mount+config without
needing env var injection.

127/127 unit tests pass, Suite 6 passes end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00