Commit Graph

40 Commits

Author SHA1 Message Date
Razvan Dimescu
025318340f fix: setup-phone probe, unknown command error, query source in dashboard
- setup-phone now probes the mobile API before printing the QR code
  and shows an actionable error if [mobile] is not enabled
- Unknown CLI subcommands print an error instead of silently
  attempting to start a full server
- Dashboard query log shows source IP under timestamp (localhost
  for loopback, full IP for LAN devices) with full addr on hover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:02:52 +03:00
Razvan Dimescu
ca04157229 feat: mobile API, enriched /health, mobileconfig module
Adds a persistent read-only HTTP listener (default port 8765, LAN-bound)
serving a dedicated subset of Numa's API for iOS/Android companion apps
and as a replacement for the one-shot server setup_phone used to spin up:

  GET /health           — enriched JSON with version, hostname, LAN IP,
                          SNI, DoT config, mobile API port, CA
                          fingerprint, features (shared handler with
                          the main API on port 5380)
  GET /ca.pem           — public CA certificate (shared handler)
  GET /mobileconfig     — full iOS profile (CA trust + DNS settings
                          pinned to current LAN IP)
  GET /ca.mobileconfig  — CA-only iOS profile (trust anchor without
                          DNS override — for the iOS companion app's
                          programmatic DNS flow via NEDNSSettingsManager)

All routes are idempotent GETs. The mobile API never serves the
state-mutating routes that live on the main API (overrides, blocking
toggle, service CRUD, cache flush), so it is safe to expose on the LAN
regardless of the main API's bind address. The CA private key is never
served by any route.

Opt-in via `[mobile] enabled = true`. Default is false so new installs
do not silently expose a LAN listener after upgrading; our committed
numa.toml template enables it explicitly for spike testing.

New modules:

- src/mobileconfig.rs — ProfileMode::{Full, CaOnly} enum with plist
  builder lifted from setup_phone.rs. Full and CaOnly share the CA
  payload UUID (same trust anchor) but have distinct top-level UUIDs
  so they coexist as separate installable profiles on iOS.

- src/health.rs — HealthMeta cached metadata built once at startup
  from config + CA fingerprint (SHA-256 of the PEM via ring), and the
  HealthResponse JSON shape shared between the main and mobile APIs.

- src/mobile_api.rs — axum Router for the persistent listener. Reuses
  api::health and api::serve_ca from the main API; owns the two
  mobileconfig handlers.

Modified:

- src/api.rs — health() returns the enriched HealthResponse, now pub.
  serve_ca is now pub so mobile_api can reuse it.
- src/config.rs — MobileConfig section (enabled, port, bind_addr).
- src/ctx.rs — health_meta: HealthMeta field on ServerCtx.
- src/main.rs — builds HealthMeta at startup, spawns mobile API
  listener if enabled.
- src/lan.rs — build_announcement takes &HealthMeta and writes
  enriched TXT records (version, api_port, proto, dot_port, ca_fp).
  SRV port now reports the mobile API port; peer discovery still
  reads TXT `services=` so this is backwards compatible. Always
  announces even when no .numa services are registered, so the iOS
  companion app can discover Numa via mDNS regardless of service
  state.
- src/setup_phone.rs — reduced from 267 to 100 lines. The CLI is now
  a thin QR wrapper over the persistent /mobileconfig endpoint; the
  hand-rolled one-shot HTTP server (accept_loop, RUST_OK_HEADERS,
  RUST_NOT_FOUND, download counter) is gone.
- src/dot.rs — test fixture updated with HealthMeta::test_fixture().
- numa.toml — commented [mobile] section, enabled = true for spike.

Tests: 136 unit tests passing (5 new in mobileconfig, 3 new in health).
cargo clippy clean. Integration sanity check: curl'd /health, /ca.pem,
/mobileconfig, /ca.mobileconfig against a running numa — all return
200 with correct content types and valid response bodies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:05:48 +03:00
Razvan Dimescu
3e0e85a761 feat: numa setup-phone — QR-based mobile DoT onboarding
Adds a CLI subcommand that generates a one-time mobileconfig profile
containing both the Numa local CA (as a com.apple.security.root payload)
and the DoT DNS settings, then serves it via a temporary HTTP server
and prints a scannable QR code in the terminal.

Flow:
  1. User runs `numa setup-phone` (no sudo needed)
  2. Detects current LAN IP, reads CA from /usr/local/var/numa/ca.pem
  3. Builds combined mobileconfig (CA trust + DoT)
  4. Renders QR code with qrcode crate (Unicode block characters)
  5. Serves the profile on port 8765, stays open until Ctrl+C
  6. Counts successful downloads (multi-device households)

Important caveat documented in instructions: even with the CA bundled
in the profile, iOS still requires the user to manually enable trust
in Settings → General → About → Certificate Trust Settings. Verified
on a real iPhone.

Stable PayloadIdentifiers/UUIDs ensure re-running replaces the
existing profile on iOS rather than accumulating duplicates.

- New module: src/setup_phone.rs (~270 lines)
- New CLI subcommand: `numa setup-phone`
- New dependency: qrcode = "0.14" (default-features = false)
- tokio "signal" feature added for Ctrl+C handling
- 3 unit tests: PEM stripping, mobileconfig generation, QR rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:04:57 +03:00
Razvan Dimescu
fab8b698d8 fix: human-readable advisories for TLS data_dir + port-53 EACCES (#48)
* fix: human-readable advisory when TLS data_dir is not writable

When numa runs as non-root on a system with a privileged default
data_dir (e.g. /usr/local/var/numa on macOS), TLS CA setup fails with
a raw "Permission denied (os error 13)" and HTTPS proxy is silently
disabled. The user sees a cryptic warning with no path forward.

Detect std::io::ErrorKind::PermissionDenied on the tls error, print a
diagnostic naming the data_dir and offering two fixes (install as
system resolver, or point data_dir at a writable path), and keep the
graceful-degradation behavior — DNS resolution and plain-HTTP proxy
continue to work without HTTPS.

All other TLS setup errors fall through to the existing log::warn!.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: port-53 advisory also handles EACCES (non-root privileged bind)

The original port-53 match arm only caught EADDRINUSE, so a fresh
non-root user on macOS/Linux hitting EACCES when trying to bind a
privileged port saw the raw OS error instead of the advisory.

Collapse the scoping helper and the advisory into a single
`try_port53_advisory(bind_addr, &io::Error) -> Option<String>` that
returns the formatted diagnostic when both the port is 53 and the
error kind is one we can speak to (AddrInUse or PermissionDenied),
and `None` otherwise. The two failure modes share one body with a
cause-sentence variant — no duplicated fix text.

Caller becomes a plain if-let: no match guard, no separate is_port_53
helper exposed on the public API. is_port_53 goes back to private.

Unit tests cover all branches: AddrInUse, PermissionDenied, non-53
bind_addr, unrelated ErrorKind, and malformed bind_addr.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: move TLS error classification into tls module

main.rs no longer downcasts a boxed error to figure out whether it's
a permission-denied case. tls::try_data_dir_advisory(&err, &dir)
encapsulates the downcast + kind match and returns Some(advisory) or
None, mirroring system_dns::try_port53_advisory. main.rs becomes a
plain if-let, symmetric with the port-53 path.

Trim the docstrings on both advisory functions: they were narrating
the implementation (errno mapping) instead of stating the contract.

Add unit tests for try_data_dir_advisory covering PermissionDenied,
other io::ErrorKind, and non-io errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 16:27:08 +03:00
Razvan Dimescu
a6f23a5ddb fix: advisory + exit(1) when port 53 is already in use (#45) (#47)
* fix: advisory + exit(1) when port 53 is already in use (#45)

Detect AddrInUse on bind, print a human-readable diagnostic explaining
systemd-resolved / Dnscache as the likely cause and offer two concrete
fixes (sudo numa install, or bind_addr on a non-privileged port), then
exit(1) instead of surfacing a raw OS error.

Adds tests/docker/smoke-port53.sh: end-to-end Docker test that
pre-binds port 53 with a Python UDP socket and asserts the advisory +
exit code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: collapse port53 advisory to single flat path

The per-platform cause sentences were cosmetic — they didn't change
the user's actions (install, or bind_addr on a non-privileged port),
but they introduced duplicated "another process..." strings, a
dead-from-CI branch (is_systemd_resolved_active() == true is never
reached by any test), and a pub visibility bump on
is_systemd_resolved_active for a single caller.

Replace with one flat format! whose cause line mentions both
systemd-resolved and the Windows DNS Client inline. The existing
smoke test now exercises 100% of the function.

is_systemd_resolved_active reverts to private.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 15:03:58 +03:00
Razvan Dimescu
6887c8e02e refactor: move data_dir override from env var to [server] TOML field
Reverts the NUMA_DATA_DIR env var added in the previous commit and
replaces it with a [server] data_dir TOML field. Numa already has a
well-developed config system; adding a parallel env-var mechanism
for a single knob was wrong.

The principle: TOML is for application behavior configuration. Env
vars are for bootstrap values (HOME, SUDO_USER to discover paths
before config loads) and standard ecosystem conventions (RUST_LOG).
data_dir is neither — it's an app knob, so it belongs in the TOML.

Changes:
- lib.rs::data_dir() reverts to the platform-specific fallback only
- config.rs adds `data_dir: Option<PathBuf>` to ServerConfig
- main.rs resolves config.server.data_dir with fallback to
  numa::data_dir() and passes it to build_tls_config, then stores the
  resolved path on ctx.data_dir for downstream consumers
- tls.rs::build_tls_config takes `data_dir: &Path` as an explicit
  parameter instead of calling crate::data_dir() behind the caller's
  back. regenerate_tls and dot.rs self_signed_tls now pass
  &ctx.data_dir, honoring whatever path the config resolved to
- tests/integration.sh Suite 6 uses `data_dir = "$NUMA_DATA"` in its
  test TOML instead of the NUMA_DATA_DIR env var prefix
- numa.toml gains a commented-out data_dir example

No behavior change for existing production deployments (the default
path is unchanged). Test harness is now fully config-driven, and
containerized deploys can override data_dir via mount+config without
needing env var injection.

127/127 unit tests pass, Suite 6 passes end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
1632fc36f2 feat: DoT write timeout and ALPN "dot" advertisement
Two DoS/interop hardening items:

1. Bound write_framed by WRITE_TIMEOUT (10s) so a slow-reader
   attacker can't indefinitely hold a worker task and its connection
   permit. Symmetric to the existing handshake timeout.

2. Advertise ALPN "dot" per RFC 7858 §3.2. Required by some strict
   DoT clients (newer Apple stacks, some Android versions). rustls
   ServerConfig exposes alpn_protocols as a pub field so we set it
   after with_single_cert:
   - load_tls_config (user-provided cert/key): set directly
   - self_signed_tls (new, replaces fallback_tls): builds a fresh
     DoT-specific TLS config via build_tls_config with the ALPN list

   build_tls_config now takes an `alpn: Vec<Vec<u8>>` parameter so
   DoT and the proxy can pass different ALPN lists while sharing the
   same CA. Proxy callers pass Vec::new() (unchanged behavior).

   Dropped the ctx.tls_config reuse branch: we can't mutate a shared
   Arc<ServerConfig> to add DoT-specific ALPN, and reusing the proxy
   config was already quietly broken re: SAN (proxy cert covers
   *.{tld}, not the DoT server's bind hostname/IP).

Added dot_negotiates_alpn test that asserts conn.alpn_protocol()
returns Some(b"dot") after handshake. 126/126 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
e4350ae81c feat: add DNS-over-TLS (DoT) listener (RFC 7858)
Refactor handle_query into transport-agnostic resolve_query that returns
a BytePacketBuffer, keeping the UDP path zero-alloc. Add a TLS listener
on port 853 with persistent connections, idle timeout, connection limits,
and coalesced writes. Supports user-provided certs or self-signed CA
fallback. Includes 5 integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
0b883d1c0d feat: Windows DNS configuration via netsh (#28)
* feat: Windows DNS configuration via netsh

numa install/uninstall now set/restore system DNS on Windows via
netsh. Parses ipconfig /all per-interface (adapter name, DHCP status,
DNS servers), saves backup to %APPDATA%\numa\original-dns.json, and
restores on uninstall (DHCP or static with secondary servers).

Handles localization (German adapter/DHCP/DNS labels), disconnected
adapters, multiple interfaces, and missing admin privileges. Adds IP
validation to discover_windows() for consistency.

No Windows Service or CA trust yet — user runs numa in a terminal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add cargo test to Windows CI job

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: upload Windows binary as artifact for testing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: SRTT decay tests panic on Windows due to Instant underflow

On Windows, Instant starts near boot time — subtracting large
durations panics. Use checked_sub with a process-start fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: SRTT decay tests use binary search for max Instant age

Replace age() helper with set_age_secs() on SrttCache that
binary-searches for the maximum subtractable duration. Prevents
panic on Windows (Instant starts at boot) while still producing
the oldest representable instant for correct decay calculations.

Also removes ephemeral test-ubuntu.sh from git.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use ProgramData for Windows DNS backup path

APPDATA differs between user and admin contexts — install runs as
admin but uninstall might resolve a different APPDATA. Use
ProgramData which is consistent across elevation contexts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: disable Dnscache on Windows install, re-enable on uninstall

Windows DNS Client (Dnscache) holds port 53 at kernel level and
can't be stopped via sc/net stop. Disable via registry during
install (requires reboot), re-enable on uninstall.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rewrite SRTT decay tests as pure functions

Decay tests manipulated Instant timestamps which panics on Windows
(Instant can't go before boot time). Rewrite to test decay_for_age()
directly — a pure function taking srtt_ms and age_secs, no platform
dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use Quad9 IP (9.9.9.9) for DoH fallback, not hostname

DoH to dns.quad9.net requires DNS to resolve the hostname, which
creates a chicken-and-egg loop when numa IS the system resolver
(e.g. after numa install on Windows). Using the IP directly avoids
the bootstrap dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract DOH_FALLBACK constant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract QUAD9_IP constant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove dead test helpers, fix constant placement

Remove unused get_srtt_ms() and saturated_penalty_cache() left over
from SRTT test rewrite. Move QUAD9_IP/DOH_FALLBACK after use block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: ignore ConnectionReset on UDP socket (Windows ICMP error)

Windows delivers ICMP port-unreachable as ConnectionReset on the
next UDP recv_from, crashing numa. Linux/macOS silently ignore these.
Catch and continue the recv loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: auto-start numa on Windows boot via registry Run key

Without a Windows Service, rebooting after numa install leaves DNS
broken (pointing at 127.0.0.1 with nothing listening). Register
numa in HKLM\...\Run so it starts automatically. Removed on
uninstall.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update README, Windows plan, and launch drafts for Windows support

- README: platform-specific Quick Start, install/uninstall table
- Windows plan: Phase 2 complete, Phase 3 scoped
- Launch drafts: updated "Does it support Windows?" response

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove docs from git tracking (already gitignored)

docs/ is in .gitignore but files were force-added. Remove from
tracking — files remain on disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 18:17:52 +03:00
Razvan Dimescu
98da440c84 feat: forward-by-default, auto recursive mode, Linux install fixes (#27)
* feat: auto recursive mode, fix Linux install

Auto mode (new default): probes a root server on startup; uses
recursive resolution if outbound DNS works, falls back to Quad9 DoH
if blocked. Dashboard shows mode indicator (green/yellow).

Linux install fixes:
- Add DNSStubListener=no to resolved drop-in (frees port 53)
- Configure DNS before starting service (correct ordering)
- Skip 127.0.0.53 in upstream detection
- `numa install` now does everything (service + DNS + CA)
- `numa uninstall` mirrors install (stop service + restore DNS)
- Extract is_loopback_or_stub() for consistent filtering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: enable DNSSEC validation by default

With recursive as the default mode, DNSSEC validation completes the
trustless resolution chain. Strict mode remains off by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: forward search domains to VPC resolver on Linux

Parse search/domain lines from resolv.conf and create conditional
forwarding rules to the original nameserver or AWS VPC resolver
(169.254.169.253). Fixes internal hostname resolution on cloud VMs
where recursive mode can't resolve private DNS zones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: single-pass resolv.conf parsing, eliminate redundancies

Parse resolv.conf once for both upstream and search domains instead
of 2-3 reads. Extract CLOUD_VPC_RESOLVER constant. Use &'static str
for mode in StatsResponse. Remove dead read_upstream_from_file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: macOS install health check, harden recursive probe

Verify numa is listening (API port) before redirecting system DNS on
macOS — if the service fails to start (e.g. port 53 in use), unload
the service and abort instead of breaking DNS. Probe up to 3 root
hints before declaring recursive mode unavailable. Validate IPs from
resolvectl to avoid IPv6 fragment extraction. Extract DEFAULT_API_PORT
constant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: widen make_rule cfg gate to include Linux

make_rule was gated to macOS-only but discover_linux() calls it for
search domain forwarding rules. CI failed on Linux with E0425.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: forward mode as default, recursive opt-in

Forward mode (transparent proxy to system DNS) is now the default.
Recursive and auto modes are explicit opt-in via config. This avoids
bypassing corporate DNS policies, captive portals, VPC private zones,
and parental controls on first install.

- Move #[default] from Auto to Forward on UpstreamMode
- DNSSEC defaults to off (no-op in forward mode)
- 3-way match in main: Forward/Recursive/Auto with clean separation
- Post-install message suggests mode = "recursive" for sovereignty
- Update README, site, and launch drafts messaging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 08:49:16 +03:00
Razvan Dimescu
ff1200eb10 feat: resolve .numa services to LAN IP for remote clients (#23)
* feat: resolve .numa services to LAN IP for remote clients

Remote DNS clients (e.g. phones on same WiFi) received 127.0.0.1 for
local .numa services, which is unreachable from their perspective.
Now returns the host's LAN IP when the query originates from a
non-loopback address. Also auto-widens proxy bind to 0.0.0.0 when
DNS is already public, and adds a startup warning when the proxy
remains localhost-only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: respect proxy bind_addr config, don't auto-widen

The auto-widen silently overrode an explicit config value — the user's
config should be the source of truth. Now the proxy always uses the
configured bind_addr, and the warning fires whenever it's 127.0.0.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update proxy bind_addr comment in example config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:15:42 +03:00
Razvan Dimescu
30e46e549c feat: in-flight query coalescing with COALESCED path (#20)
* feat: in-flight query coalescing for recursive resolver

When multiple queries for the same (domain, qtype) arrive concurrently
and all miss the cache, only the first triggers recursive resolution.
Subsequent queries wait on a broadcast channel for the result.

Prevents thundering herd where N concurrent cache misses each
independently walk the full NS chain, compounding timeouts.

Uses InflightGuard (Drop impl) to guarantee map cleanup on
panic/cancellation — prevents permanent SERVFAIL poisoning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: add InflightMap type alias for clippy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add COALESCED query path and coalescing tests

Followers in the inflight coalescing path now log as COALESCED instead
of RECURSIVE, making it visible in the dashboard when queries were
deduplicated vs independently resolved. Adds 10 tests covering
InflightGuard cleanup, broadcast mechanics, and concurrent handle_query
coalescing through a mock TCP DNS server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract acquire_inflight, rewrite tests against real code

Move Disposition enum and inflight acquisition logic into a standalone
acquire_inflight() function. Rewrite 4 tests that were exercising tokio
primitives to call the real coalescing code path instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 10:36:02 +03:00
Razvan Dimescu
06d4e91cd2 feat: SRTT-based nameserver selection (#19)
* feat: SRTT-based nameserver selection for recursive resolver

BIND-style Smoothed RTT (EWMA) tracking per NS IP address. The resolver
learns which nameservers respond fastest and prefers them, eliminating
cascading timeouts from slow/unreachable IPv6 servers.

- New src/srtt.rs: SrttCache with record_rtt, record_failure, sort_by_rtt
- EWMA formula: new = (old * 7 + sample) / 8, 5s failure penalty, 5min decay
- TCP penalty (+100ms) lets SRTT naturally deprioritize IPv6-over-TCP
- Enabled flag embedded in SrttCache (no-op when disabled)
- Batch eviction (64 entries) for O(1) amortized writes at capacity
- Configurable via [upstream] srtt = true/false (default: true)
- Benchmark script: scripts/benchmark.sh (full, cold, warm, compare-all)
- Benchmarks show 12x avg improvement, 0% queries >1s (was 58%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: show DNSSEC and SRTT status in dashboard + API

Add dnssec and srtt boolean fields to /stats API response.
Display on/off indicators in the dashboard footer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply SRTT decay before EWMA so recovered servers rehabilitate

Without decay-before-EWMA, a server penalized at 5000ms stayed near
that value even after recovery — the stale raw penalty was used as the
EWMA base instead of the decayed estimate. Extract decayed_srtt()
helper and call it in record_rtt() before the smoothing step.

Also restores removed "why" comments in send_query / resolve_recursive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add install/upgrade instructions, smarter benchmark priming

README: document `numa install`, `numa service`, Homebrew upgrade,
and `make deploy` workflows. Benchmark: replace fixed `sleep 4` with
`wait_for_priming` that polls cache entry count for stability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 23:22:31 +02:00
Razvan Dimescu
a84f2e7f1d feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation

Numa becomes a true DNS resolver — resolves from root nameservers
with complete DNSSEC chain-of-trust verification.

Recursive resolution:
- Iterative RFC 1034 from configurable root hints (13 default)
- CNAME chasing (depth 8), referral following (depth 10)
- A+AAAA glue extraction, IPv6 nameserver support
- TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs
- Config: mode = "recursive" in [upstream], root_hints, prime_tlds

DNSSEC (all 4 phases):
- EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020)
- DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write
- Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519
- Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326)
- DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK)
- RRSIG expiration/inception time validation
- NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial
- NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range
- Authority RRSIG verification for denial proofs
- Config: [dnssec] enabled/strict (default false, opt-in)
- AD bit on Secure, SERVFAIL on Bogus+strict
- DnssecStatus cached per entry, ValidationStats logging

Performance:
- TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY)
- Referral DS piggybacking from authority sections
- DNSKEY prefetch before validation loop
- Cold-cache validation: ~1 DNSKEY fetch (down from 5)
- Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns

Also:
- write_qname fix for root domain "." (was producing malformed queries)
- write_record_header() dedup, write_bytes() bulk writes
- DnsRecord::domain() + query_type() accessors
- UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const
- Real glue TTL (was hardcoded 3600)
- DNSSEC restricted to recursive mode only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: TCP fallback, query minimization, UDP auto-disable

Transport resilience for restrictive networks (ISPs blocking UDP:53):
- DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry
- UDP auto-disable: after 3 consecutive failures, switch to TCP-first
- IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6)
- Network change resets UDP detection for re-probing
- Root hint rotation in TLD priming

Privacy:
- RFC 7816 query minimization: root servers see TLD only, not full name

Code quality:
- Merged find_starting_ns + find_starting_zone → find_closest_ns
- Extracted resolve_ns_addrs_from_glue shared helper
- Removed overall timeout wrapper (per-hop timeouts sufficient)
- forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2)

Testing:
- Mock TCP-only DNS server for fallback tests (no network needed)
- tcp_fallback_resolves_when_udp_blocked
- tcp_only_iterative_resolution
- tcp_fallback_handles_nxdomain
- udp_auto_disable_resets
- Integration test suite (4 suites, 51 tests)
- Network probe script (tests/network-probe.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DNSSEC verified badge in dashboard query log

- Add dnssec field to QueryLogEntry, track validation status per query
- DnssecStatus::as_str() for API serialization
- Dashboard shows green checkmark next to DNSSEC-verified responses
- Blog post: add "How keys get there" section, transport resilience section,
  trim code blocks, update What's Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SVG shield for DNSSEC badge, update blog HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: NS cache lookup from authorities, UDP re-probe, shield alignment

- find_closest_ns checks authorities (not just answers) for NS records,
  fixing TLD priming cache misses that caused redundant root queries
- Periodic UDP re-probe every 5min when disabled — re-enables UDP
  after switching from a restrictive network to an open one
- Dashboard DNSSEC shield uses fixed-width container for alignment
- Blog post: tuck key-tag into trust anchor paragraph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TCP single-write, mock server consistency, integration tests

- TCP single-write fix: combine length prefix + message to avoid split
  segments that Microsoft/Azure DNS servers reject
- Mock server (spawn_tcp_dns_server) updated to use single-write too
- Tests: forward_tcp_wire_format, forward_tcp_single_segment_write
- Integration: real-server checks for Microsoft/Office/Azure domains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: recursive bar in dashboard, special-use domain interception

Dashboard:
- Add Recursive bar to resolution paths chart (cyan, distinct from Override)
- Add RECURSIVE path tag style in query log

Special-use domains (RFC 6761/6303/8880/9462):
- .localhost → 127.0.0.1 (RFC 6761)
- Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN
- _dns.resolver.arpa (DDR) → NXDOMAIN
- ipv4only.arpa (NAT64) → 192.0.0.170/171
- mDNS service discovery for private ranges → NXDOMAIN

Eliminates ~900ms SERVFAILs for macOS system queries that were
hitting root servers unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move generated blog HTML to site/blog/posts/, gitignore

- Generated HTML now in site/blog/posts/ (gitignored)
- CI workflow runs pandoc + make blog before deploy
- Updated all internal blog links to /blog/posts/ path
- blog/*.md remains the source of truth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — memory ordering, RRSIG time, NS resolution

- Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES
  (ARM correctness for cross-thread coordination)
- RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5)
  + 300s clock skew fudge factor (matches BIND)
- resolve_ns_addrs_from_glue collects addresses from ALL NS names,
  not just the first with glue (improves failover)
- is_special_use_domain: eliminate 16 format! allocations per
  .in-addr.arpa query (parse octet instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: API endpoint tests, coverage target

- 8 new axum handler tests: health, stats, query-log, overrides CRUD,
  cache, blocking stats, services CRUD, dashboard HTML
- Tests use tower::oneshot — no network, no server startup
- test_ctx() builds minimal ServerCtx for isolated testing
- `make coverage` target (cargo-tarpaulin), separate from `make all`
- 82 total tests (was 74)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 04:03:47 +02:00
Razvan Dimescu
962b400f4c perf: optimize DNS query hot path (#15)
* perf: optimize hot path — RwLock, inline filtering, pre-allocated strings

- Mutex → RwLock for cache, blocklist, and overrides (concurrent read access)
- Make cache.lookup() and overrides.lookup() take &self (read-only)
- Eliminate 3 Vec allocations per DnsPacket::write() via inline filtering
- Pre-allocate domain strings with capacity 64 in parse path
- Add criterion micro-benchmarks (hot_path + throughput)
- Add bench README documenting both benchmark suites

Measured improvement: ~14% faster parsing, ~9% pipeline throughput,
round-trip cached 733ns → 698ns (~2.3M queries/sec).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: simplify benchmark code after review

- Remove redundant DnsHeader::new() (already set by DnsPacket::new())
- Remove unused DnsHeader import
- Change simulate_cached_pipeline to take &DnsCache (lookup is &self now)
- Remove unnecessary mut on cache in cache_lookup_miss bench

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:01:08 +02:00
Razvan Dimescu
c5208e934d feat: DNS-over-HTTPS (DoH) upstream forwarding (#14)
* feat: DNS-over-HTTPS upstream forwarding

Encrypt upstream queries via DoH — ISPs see HTTPS traffic on port 443,
not plaintext DNS on port 53. URL scheme determines transport:
https:// = DoH, bare IP = plain UDP. Falls back to Quad9 DoH when
system resolver cannot be detected.

- Upstream enum (Udp/Doh) with Display and PartialEq
- BytePacketBuffer::from_bytes constructor
- reqwest http2 feature for DoH server compatibility
- network_watch_loop guards against DoH→UDP silent downgrade
- 5 new tests (mock DoH server, HTTP errors, timeout)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add DoH to README — Why Numa, comparison table, roadmap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 00:39:58 +02:00
Razvan Dimescu
e0c1997056 fix: regenerate TLS cert when services change (hot-reload via ArcSwap)
HTTPS proxy certs were generated once at startup. Services added at
runtime via API or LAN discovery got "not secure" in the browser
because their SAN wasn't in the cert. Now the cert is regenerated
on every service add/remove and swapped atomically via ArcSwap.
In-flight connections are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 16:14:06 +02:00
Razvan Dimescu
e7e5c173f2 dynamic banner width, hoist HTML escaper, cache CA, restore log path
- banner box width adapts to longest value (fixes overflow with long paths)
- hoist h() HTML escape function to script top, remove 3 local copies
- serve_ca: add Cache-Control: public, max-age=86400
- restore log path in dashboard footer alongside new config/data fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 12:29:18 +02:00
Razvan Dimescu
c6b35045d8 config visibility, PR review fixes, XSS hardening
Config visibility:
- startup banner shows config path, data dir, services path
- config search: ./numa.toml → ~/.config/numa/ → /usr/local/var/numa/
- /stats API exposes config_path and data_dir, dashboard footer renders them
- GET /ca.pem endpoint serves CA cert for cross-device TLS trust
- load_config returns ConfigLoad with found flag, warns on not-found
- ServerCtx stores PathBuf for config_dir/data_dir, string conversion at boundaries

PR review fixes:
- add explicit parens in resolve_route operator precedence (service_store.rs)
- hostname portability: drop -s flag, trim domain with split('.') (lan.rs)
- serve_ca uses spawn_blocking instead of sync fs::read in async handler
- load_config: remove TOCTOU exists() check, read directly and handle NotFound

XSS hardening:
- HTML-escape all user-controlled interpolations in dashboard (service names,
  route paths, ports, URLs, block check domain/reason)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 12:24:21 +02:00
Razvan Dimescu
41a97bb930 dashboard: show LAN status in Local Services panel header
- Add lan_enabled to ServerCtx
- Add lan field to /stats API (enabled, peer count)
- Dashboard shows "LAN off" (dim) or "LAN on · N peers" (green)
- Tooltip shows enable command or mDNS service type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:16:52 +02:00
Razvan Dimescu
4020776b8e simplify set_lan_enabled: fix config path, TOCTOU, double iteration
- Accept config path parameter (consistent with main's resolution)
- Read first, match on NotFound (eliminates TOCTOU race)
- Single position() call replaces any() + position()
- Precise key matching via split_once('=')
- Preserve original indentation on replacement
- Extract print_lan_status helper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:59:35 +02:00
Razvan Dimescu
763ba1de91 add numa lan on/off CLI command, update README
- numa lan on/off toggles LAN discovery in numa.toml
- Writes [lan] section if missing, updates enabled if present
- Colored output with restart hint
- README: add lan on/off to help text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:30:22 +02:00
Razvan Dimescu
c836903db5 simplify: unify route structs, fix prefix collision, lint fixes
- Unify RouteConfig/RouteEntry/RouteResponse into single RouteEntry
- Fix prefix collision: /api no longer matches /apiary (segment boundary check)
- Add path traversal rejection in route API
- Extract MdnsAnnouncement struct (clippy type_complexity)
- cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 06:57:57 +02:00
Razvan Dimescu
5e5a6544bc LAN opt-in, mDNS migration, security hardening, path-based routing
- LAN discovery disabled by default (opt-in via [lan] enabled = true)
- Replace custom JSON multicast (239.255.70.78:5390) with standard mDNS
  (_numa._tcp.local on 224.0.0.251:5353) using existing DNS parser
- Instance ID in TXT record for multi-instance self-filtering
- API and proxy bind to 127.0.0.1 by default (0.0.0.0 when LAN enabled)
- Path-based routing: longest prefix match with optional prefix stripping
  via [[services]] routes = [{path, port, strip?}]
- REST API: GET/POST/DELETE /services/{name}/routes
- Dashboard shows route lines per service when configured
- Segment-boundary route matching (prevents /api matching /apiary)
- Route path validation (rejects path traversal)

Closes #11

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 06:56:31 +02:00
Razvan Dimescu
4c58ff49b0 reduce network change detection to 5s with tiered polling
LAN IP checked every 5s (cheap UDP socket call). Full upstream
re-detection runs every 30s as safety net, or immediately when
LAN IP changes. Reduces worst-case network switch recovery from
30s to 5s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 19:36:03 +02:00
Razvan Dimescu
06850de728 fix circular reference: detect DHCP DNS when scutil shows loopback
When numa install is active, scutil --dns only returns 127.0.0.1.
Previously fell back to 9.9.9.9 (Quad9) which fails on networks
that block external DNS. Now reads DHCP-provided DNS from
ipconfig getpacket en0/en1 as intermediate fallback before Quad9.

Tested on a network that blocks 8.8.8.8, 9.9.9.9, 1.1.1.1 but
allows ISP DNS (213.154.124.25) — Numa now auto-detects and uses it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 10:24:54 +02:00
Razvan Dimescu
995916d01b generalize upstream re-detection into network change watcher
Always detect network changes (LAN IP, upstream, peers) regardless
of upstream config. LAN IP is now tracked in ServerCtx and updated
every 30s — multicast announcements use the current IP instead of
the startup IP. Upstream re-detection still only runs when
auto-detected. Peer flush triggers on any network change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 09:38:09 +02:00
Razvan Dimescu
7aca3b1991 fix DNS failure on network change with upstream re-detection
Upstream DNS was resolved once at startup and never updated. Switching
Wi-Fi networks made all queries fail until restart.

Now spawns a background task (every 30s) that re-runs system DNS
discovery and swaps the upstream atomically if it changed. Also flushes
stale LAN peers from the old network on change.

Only activates when upstream is auto-detected (not explicitly configured).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 09:31:49 +02:00
Razvan Dimescu
c9f1d98f45 add LAN service discovery via UDP multicast
Numa instances on the same network auto-discover each other's .numa
services. No config, no cloud — just multicast on 239.255.70.78:5390.

- PeerStore with lazy expiry (90s timeout, 30s broadcast interval)
- DNS resolves remote .numa services to peer's LAN IP (not localhost)
- Proxy forwards to peer IP for remote services
- Graceful degradation if multicast bind fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 16:45:46 +02:00
Razvan Dimescu
5e7a653f9c fix rustfmt formatting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 01:15:51 +02:00
Razvan Dimescu
3bfcd827ac add TLS, service persistence, blocking panel, query types
- Local TLS: auto-generated CA + per-service certs (explicit SANs, not
  wildcards — browsers reject *.numa under single-label TLDs). HTTPS
  proxy on :443 via rustls/tokio-rustls. `numa install` trusts CA in
  macOS Keychain / Linux ca-certificates.
- Service persistence: user-added services saved to
  ~/.config/numa/services.json, survive restarts.
- Blocking panel: renamed "Check Domain" to "Blocking" with sources
  display, allowlist management UI, unpause button.
- Query types: recognize SOA, PTR, TXT, SRV, HTTPS (type 65) instead
  of logging as UNKNOWN.
- Blocklist gzip: reqwest now decompresses gzip responses from CDNs.
- Unified config_dir() in lib.rs for consistent path resolution under
  sudo and launchd. TLS certs use /usr/local/var/numa/ (writable as
  root daemon).
- Dashboard UX: panel subtitles differentiating overrides vs services,
  better placeholders, proxy route display, 600px query log height.
- Deploy: make deploy handles build+copy+codesign+restart cycle.
- Demo: scripts/record-demo.sh for recording hero GIF with CDP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 01:15:07 +02:00
Razvan Dimescu
8f959ce0a5 add local service proxy with .numa domains
HTTP reverse proxy on port 80 lets developers use clean domain names
(frontend.numa, api.numa) instead of localhost:PORT. Includes WebSocket
upgrade support for HMR, TCP health checks, dashboard UI panel, and
REST API for service management. numa.numa is preconfigured for the
dashboard itself.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 15:07:15 +02:00
Razvan Dimescu
14a9e9e7e3 add numa service restart command
Kills the running process and lets launchd/systemd respawn it
with the updated binary. DNS stays configured throughout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 14:26:56 +02:00
Razvan Dimescu
ee776938c5 add auto-detect upstream, install script, release workflow
- Default upstream auto-detected from system resolver (scutil/resolv.conf)
  instead of hardcoding Google 8.8.8.8. Falls back to Quad9 (9.9.9.9).
- Single scutil --dns pass for both upstream detection and forwarding rules
- Linux: reads backup resolv.conf if current only has loopback
- Service start/stop now couples DNS config (install on start, uninstall on stop)
- Install script for one-line binary install from GitHub Releases
- GitHub Actions release workflow: builds for macOS/Linux x86_64/aarch64

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 14:14:20 +02:00
Razvan Dimescu
0658ed7310 add service management CLI, log path in dashboard footer
- numa service start/stop/status commands (launchd + systemd)
- Dashboard footer shows log paths (macOS + Linux) and GitHub link
- Help text updated with all service commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:41:16 +02:00
Razvan Dimescu
2db44bd7d0 add system DNS auto-configuration (install/uninstall)
numa install  — saves current DNS, sets all network services to 127.0.0.1
numa uninstall — restores original DNS from ~/.numa/original-dns.json
numa help — shows usage

macOS: uses networksetup to enumerate services and set/restore DNS.
Linux: stubs with instructions for manual setup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:39:30 +02:00
Razvan Dimescu
4dc5b94c7a add ad blocking, live dashboard, system DNS auto-discovery
- DNS-level ad blocking: 385K+ domains via Hagezi Pro blocklist, subdomain
  matching, one-click allowlist, pause/toggle, background refresh every 24h
- Live dashboard at :5380 with real-time stats, query log, override
  management (create/edit/delete), blocking controls
- System DNS auto-discovery: parses scutil --dns on macOS to find
  conditional forwarding rules (Tailscale, VPN split-DNS)
- REST API expanded to 18 endpoints (blocking, overrides, diagnostics)
- Startup banner with colored system info
- Performance benchmarks (bench/dns-bench.sh)
- Landing page updated with new positioning and comparison table
- CI, Dockerfile, LICENSE, development plan docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 10:54:23 +02:00
Razvan Dimescu
89e7cbd989 add Makefile with clippy/rustfmt linting, fix all warnings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:04:31 +02:00
Razvan Dimescu
9c71e9bb3f refactor to async tokio with modular architecture
- Replace synchronous std::net::UdpSocket with tokio async runtime
- Spawn concurrent task per incoming DNS query via tokio::spawn
- Extract monolithic main.rs into modules: buffer, header, question,
  record, packet, config, cache, forward, stats
- Share state across tasks via Arc<ServerCtx> with scoped Mutex locks
- Add TOML config loading, TTL-aware cache, structured logging, stats
- Add CLAUDE.md, README, dns_fun.toml config, and design docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:50:16 +02:00
razvandimescu
4e61caac45 first commit 2020-12-29 12:29:09 +02:00