Commit Graph

246 Commits

Author SHA1 Message Date
Razvan Dimescu
6887c8e02e refactor: move data_dir override from env var to [server] TOML field
Reverts the NUMA_DATA_DIR env var added in the previous commit and
replaces it with a [server] data_dir TOML field. Numa already has a
well-developed config system; adding a parallel env-var mechanism
for a single knob was wrong.

The principle: TOML is for application behavior configuration. Env
vars are for bootstrap values (HOME, SUDO_USER to discover paths
before config loads) and standard ecosystem conventions (RUST_LOG).
data_dir is neither — it's an app knob, so it belongs in the TOML.

Changes:
- lib.rs::data_dir() reverts to the platform-specific fallback only
- config.rs adds `data_dir: Option<PathBuf>` to ServerConfig
- main.rs resolves config.server.data_dir with fallback to
  numa::data_dir() and passes it to build_tls_config, then stores the
  resolved path on ctx.data_dir for downstream consumers
- tls.rs::build_tls_config takes `data_dir: &Path` as an explicit
  parameter instead of calling crate::data_dir() behind the caller's
  back. regenerate_tls and dot.rs self_signed_tls now pass
  &ctx.data_dir, honoring whatever path the config resolved to
- tests/integration.sh Suite 6 uses `data_dir = "$NUMA_DATA"` in its
  test TOML instead of the NUMA_DATA_DIR env var prefix
- numa.toml gains a commented-out data_dir example

No behavior change for existing production deployments (the default
path is unchanged). Test harness is now fully config-driven, and
containerized deploys can override data_dir via mount+config without
needing env var injection.

127/127 unit tests pass, Suite 6 passes end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
7f52bd8a32 test: Suite 6 — proxy + DoT coexistence, NUMA_DATA_DIR override
Adds integration test coverage for the realistic production shape
where both the HTTPS proxy and DoT are enabled simultaneously. This
was previously untested — every existing suite had either one or the
other, so the interaction path was implicit.

What Suite 6 verifies:
- Both listeners bind without panic
- DoT still resolves queries with the proxy enabled
- Proxy HTTPS handshake still works with DoT enabled
- Both certs validate against the same shared CA

To run non-root, adds a NUMA_DATA_DIR env var override to data_dir()
that lets callers point the CA/cert storage at any writable path.
Useful beyond tests: containerized deployments, CI runners, dev
testing without sudo. The fallback is the existing platform-specific
path (unix: /usr/local/var/numa, windows: %PROGRAMDATA%\numa).

Suite 6 sets NUMA_DATA_DIR=/tmp/numa-integration-data before
starting numa, then trusts the generated CA at $NUMA_DATA_DIR/ca.pem
for both kdig (DoT query) and openssl s_client (HTTPS proxy
handshake) verification.

All 6 suites, 32 checks, run non-root and pass locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
c98e6c3ea9 fix: install rustls crypto provider when loading user DoT cert
Adds tests/integration.sh Suite 5 (DoT via kdig + openssl) and
fixes a startup panic caught by it.

Bug: when [dot] cert_path/key_path was set AND [proxy] was disabled,
numa panicked on the first DoT handshake with "Could not
automatically determine the process-level CryptoProvider from Rustls
crate features". In normal deployments the proxy's build_tls_config
installs the default provider as a side effect, masking the missing
call in dot.rs::load_tls_config. Disable the proxy and the panic
surfaces. Fix: call
rustls::crypto::ring::default_provider().install_default() at the
top of load_tls_config (no-op if already installed).

Suite 5 exercises:
- DoT listener binds on configured port
- Resolves a local zone A record over TLS (kdig +tls)
- Persistent connection reuse (kdig +keepopen, 3 queries, 1 handshake)
- ALPN "dot" negotiation (openssl s_client -alpn dot)
- ALPN mismatch rejected with no_application_protocol (openssl -alpn h2)

Uses a pre-generated cert at /tmp so the test runs non-root.
Skips gracefully if kdig or openssl aren't installed.

Also: Dockerfile now EXPOSE 853/tcp so docker run -p 853:853 works
out of the box when users enable DoT.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
186e709373 test: verify DoT server rejects mismatched ALPN
Adds dot_rejects_non_dot_alpn to assert the rustls server enforces
ALPN strictness rather than silently accepting a mismatched
negotiation. This is the load-bearing behavior behind the cross-
protocol confusion defense — without enforcement, the ALPN "dot"
advertisement is just a sign hung on an unlocked door.

Refactors test_tls_configs to return the leaf cert DER instead of a
prebuilt client config, and adds a dot_client(cert_der, alpn) helper
so each test can build a client config with the ALPN list it needs.
The five existing DoT tests gain one line each to call dot_client
with dot_alpn(); behavior unchanged.

127/127 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
bacc49667a fix: DoT cert needs explicit {tld}.{tld} SAN, not just *.{tld} wildcard
self_signed_tls was passing an empty service_names list, so the
generated cert only had the *.numa wildcard SAN. Strict TLS clients
(browsers, possibly some iOS versions) reject wildcards under
single-label TLDs — see the existing comment in tls.rs explaining
why the proxy lists each service explicitly.

setup-phone's mobileconfig sends ServerName "numa.numa" as SNI, so
the DoT cert must have an explicit numa.numa SAN. Pass proxy_tld
itself as a service name, mirroring how main.rs already registers
"numa" as a service for the proxy's TLS cert.

Test fixture updated to mirror the production SAN shape (*.numa +
numa.numa) and switched the client to SNI "numa.numa", so the
existing DoT test suite implicitly exercises the SNI path used by
setup-phone clients.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
7d0fe19462 style: drop narrating comments on dot_alpn and ALPN test
Both were restating what the code already said — dot_alpn's doc
narrated the function name and the test comment restated the
assertion. RFC 7858 §3.2 is already cited on self_signed_tls and
build_tls_config where the "why" actually matters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
1632fc36f2 feat: DoT write timeout and ALPN "dot" advertisement
Two DoS/interop hardening items:

1. Bound write_framed by WRITE_TIMEOUT (10s) so a slow-reader
   attacker can't indefinitely hold a worker task and its connection
   permit. Symmetric to the existing handshake timeout.

2. Advertise ALPN "dot" per RFC 7858 §3.2. Required by some strict
   DoT clients (newer Apple stacks, some Android versions). rustls
   ServerConfig exposes alpn_protocols as a pub field so we set it
   after with_single_cert:
   - load_tls_config (user-provided cert/key): set directly
   - self_signed_tls (new, replaces fallback_tls): builds a fresh
     DoT-specific TLS config via build_tls_config with the ALPN list

   build_tls_config now takes an `alpn: Vec<Vec<u8>>` parameter so
   DoT and the proxy can pass different ALPN lists while sharing the
   same CA. Proxy callers pass Vec::new() (unchanged behavior).

   Dropped the ctx.tls_config reuse branch: we can't mutate a shared
   Arc<ServerConfig> to add DoT-specific ALPN, and reusing the proxy
   config was already quietly broken re: SAN (proxy cert covers
   *.{tld}, not the DoT server's bind hostname/IP).

Added dot_negotiates_alpn test that asserts conn.alpn_protocol()
returns Some(b"dot") after handshake. 126/126 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
0a73cdf4db docs: add commented-out [dot] example to numa.toml
Matches the style of the other opt-in sections (blocking, dnssec, lan).
Documents all five DotConfig fields with their defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
2b0c4e3d5e refactor: trim DoT listener — let-else reads, drop MIN_MSG_LEN and redundant localhost test
- Collapse two 4-arm read/timeout matches to let-else (lose one
  defensive debug log on payload-read timeout; idle timeouts are
  routine on persistent DoT connections anyway)
- Drop MIN_MSG_LEN: DnsPacket::from_buffer rejects truncated input
  on its own, and BytePacketBuffer is zero-init so buf[0..2] for
  sub-2-byte messages just yields a harmless FORMERR with id=0
- Inline ACCEPT_ERROR_BACKOFF (single use site)
- Drop the partial cert/key warning: missing one of cert_path/
  key_path silently falls back to self-signed; users see the
  self-signed cert at startup and figure it out
- Drop dot_localhost_resolution test: RFC 6761 localhost is tested
  in ctx.rs; this test only verified DoT transport, which
  dot_resolves_local_zone already covers
- Drop self-documenting comment in dot_multiple_queries_on_persistent_connection

Net -32 lines, 125/125 tests pass, no behavior change users would notice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
357c710ec4 style: rustfmt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
7742858b7b refactor: simplify DoT cert/key match and extract send_response helper
- Flatten 4-arm cert/key match in start_dot to 2 arms with the
  partial-config warning hoisted into a one-liner above the match.
- Extract send_response() that serializes a DnsPacket and writes it
  framed, used by both the FORMERR-on-parse-error and SERVFAIL-on-
  resolve-error paths. Removes duplicated buffer/write/log boilerplate
  and unifies the rescode logging via {:?}.

No behavior change; 126/126 tests still pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
1239ed0e72 fix: parse DoT queries up-front and echo question in SERVFAIL
Address review findings on PR #25:

- Refactor resolve_query to take a pre-parsed DnsPacket. Parse-error
  handling moves to the UDP caller, eliminating the double warn! line
  on malformed UDP queries.
- Enforce MIN_MSG_LEN=12 (DNS header) in handle_dot_connection so
  query_id extraction is always reading client-sent bytes, not the
  zeroed buffer tail.
- Parse the DoT query before calling resolve_query and retain it, so
  SERVFAIL responses can echo the original question section via
  response_from(). Parse failures send FORMERR with the client id.
- Extract write_framed() helper for length-prefix + flush, reused by
  success, SERVFAIL, and FORMERR paths.
- Back off 100ms on listener.accept() errors to avoid tight-looping
  on fd exhaustion.
- Replace the hardcoded 127.0.0.1:53 upstream in dot_nxdomain_for_unknown
  with a bound-but-unresponsive UDP socket owned by the test, making it
  independent of the host's local resolver. Test now runs in ~220ms
  (timeout lowered to 200ms) instead of 3s and asserts the question is
  echoed in the SERVFAIL response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
cb54ab3dfc fix: harden DoT listener against slowloris and stale handshakes
- Add 10s timeout on TLS handshake — prevents clients from holding a
  semaphore permit without completing the handshake
- Add IDLE_TIMEOUT on payload read_exact — prevents slowloris after
  sending a valid length prefix then trickling bytes
- Extract accept_loop() shared between start_dot and tests — eliminates
  duplicated accept logic that could drift
- Add 5s timeout on TCP reads in recursive test mock server

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
aa8923b2c6 fix: add debug logging for DoT SERVFAIL serialization failure, TC-bit TODO
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
14efc51340 fix: send SERVFAIL on DoT resolve errors, extract shared connection handler
- Send SERVFAIL response (with correct query ID) when resolve_query
  fails, preventing DoT clients from hanging until idle timeout
- Extract handle_dot_connection() so tests use the same logic as
  production, eliminating duplicated accept/read/resolve loop
- Replace magic 4096 with named MAX_MSG_LEN constant tied to BUF_SIZE
- Add flush() after each TLS write to prevent buffered responses
- Extract fallback_tls() helper, handle partial cert/key config,
  support IPv6 bind address, remove redundant crypto provider init

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
e4350ae81c feat: add DNS-over-TLS (DoT) listener (RFC 7858)
Refactor handle_query into transport-agnostic resolve_query that returns
a BytePacketBuffer, keeping the UDP path zero-alloc. Add a TLS listener
on port 853 with persistent connections, idle timeout, connection limits,
and coalesced writes. Supports user-provided certs or self-signed CA
fallback. Includes 5 integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:53:43 +03:00
Razvan Dimescu
766935ec97 style: fix rustfmt formatting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:46:54 +03:00
Razvan Dimescu
efe3669540 fix: gate exe_path and replace_exe_path for Windows clippy, add macOS CI
- Gate exe_path in restart_service() and replace_exe_path() behind
  #[cfg(any(target_os = "macos", target_os = "linux"))] to fix
  unused variable and dead code warnings on Windows
- Add macOS CI job (clippy + tests)
- Add test for template substitution in plist and systemd unit files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:46:54 +03:00
Laurin Brandner
ad34fe2d9e Fix unit replacement for linux 2026-04-06 22:28:30 +03:00
Laurin Brandner
80fcfd10ae flexible installation path 2026-04-06 22:28:30 +03:00
Razvan Dimescu
e4a8893214 Merge pull request #30 from razvandimescu/release/v0.9.1
chore: bump version to 0.9.1
v0.9.1
2026-04-03 00:39:45 +03:00
Razvan Dimescu
d979cd9505 chore: bump version to 0.9.1
Fix: forwarding rules ignored in recursive mode (Tailscale/VPN).
Fix: browsers treating .numa as search query (add search domain).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 00:08:36 +03:00
Razvan Dimescu
8c421b9fa3 fix: check forwarding rules before recursive resolution (#29)
Conditional forwarding (Tailscale .ts.net, VPC private zones) was
only checked in the forward mode branch. In recursive mode, queries
for forwarding-rule domains went to root servers instead of the
configured upstream, returning NXDOMAIN for private domains.

Move the forwarding rule check before the recursive/forward branch
so it takes priority regardless of mode.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 00:07:11 +03:00
Razvan Dimescu
ad7884f2f6 fix: add numa search domain on install for browser compatibility
Chrome treats single-label TLDs (e.g. frontend.numa) as search
queries unless a trailing slash is added. Adding "numa" as a search
domain tells the OS resolver that .numa is valid, so browsers
resolve it directly.

macOS: networksetup -setsearchdomains, cleared on uninstall
Linux (resolved): Domains=~. numa in drop-in
Linux (resolv.conf): search numa

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:50:22 +03:00
Razvan Dimescu
6a70ab0f1b chore: bump version to 0.9.0
New: Windows DNS configuration (install/uninstall/auto-start).
Fix: DoH fallback uses IP to avoid DNS bootstrap loop.
Fix: UDP ConnectionReset crash on Windows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.9.0
2026-04-01 18:22:18 +03:00
Razvan Dimescu
0b883d1c0d feat: Windows DNS configuration via netsh (#28)
* feat: Windows DNS configuration via netsh

numa install/uninstall now set/restore system DNS on Windows via
netsh. Parses ipconfig /all per-interface (adapter name, DHCP status,
DNS servers), saves backup to %APPDATA%\numa\original-dns.json, and
restores on uninstall (DHCP or static with secondary servers).

Handles localization (German adapter/DHCP/DNS labels), disconnected
adapters, multiple interfaces, and missing admin privileges. Adds IP
validation to discover_windows() for consistency.

No Windows Service or CA trust yet — user runs numa in a terminal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add cargo test to Windows CI job

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: upload Windows binary as artifact for testing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: SRTT decay tests panic on Windows due to Instant underflow

On Windows, Instant starts near boot time — subtracting large
durations panics. Use checked_sub with a process-start fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: SRTT decay tests use binary search for max Instant age

Replace age() helper with set_age_secs() on SrttCache that
binary-searches for the maximum subtractable duration. Prevents
panic on Windows (Instant starts at boot) while still producing
the oldest representable instant for correct decay calculations.

Also removes ephemeral test-ubuntu.sh from git.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use ProgramData for Windows DNS backup path

APPDATA differs between user and admin contexts — install runs as
admin but uninstall might resolve a different APPDATA. Use
ProgramData which is consistent across elevation contexts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: disable Dnscache on Windows install, re-enable on uninstall

Windows DNS Client (Dnscache) holds port 53 at kernel level and
can't be stopped via sc/net stop. Disable via registry during
install (requires reboot), re-enable on uninstall.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rewrite SRTT decay tests as pure functions

Decay tests manipulated Instant timestamps which panics on Windows
(Instant can't go before boot time). Rewrite to test decay_for_age()
directly — a pure function taking srtt_ms and age_secs, no platform
dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use Quad9 IP (9.9.9.9) for DoH fallback, not hostname

DoH to dns.quad9.net requires DNS to resolve the hostname, which
creates a chicken-and-egg loop when numa IS the system resolver
(e.g. after numa install on Windows). Using the IP directly avoids
the bootstrap dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract DOH_FALLBACK constant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract QUAD9_IP constant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove dead test helpers, fix constant placement

Remove unused get_srtt_ms() and saturated_penalty_cache() left over
from SRTT test rewrite. Move QUAD9_IP/DOH_FALLBACK after use block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: ignore ConnectionReset on UDP socket (Windows ICMP error)

Windows delivers ICMP port-unreachable as ConnectionReset on the
next UDP recv_from, crashing numa. Linux/macOS silently ignore these.
Catch and continue the recv loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: auto-start numa on Windows boot via registry Run key

Without a Windows Service, rebooting after numa install leaves DNS
broken (pointing at 127.0.0.1 with nothing listening). Register
numa in HKLM\...\Run so it starts automatically. Removed on
uninstall.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update README, Windows plan, and launch drafts for Windows support

- README: platform-specific Quick Start, install/uninstall table
- Windows plan: Phase 2 complete, Phase 3 scoped
- Launch drafts: updated "Does it support Windows?" response

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove docs from git tracking (already gitignored)

docs/ is in .gitignore but files were force-added. Remove from
tracking — files remain on disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 18:17:52 +03:00
Razvan Dimescu
7f46f6271e docs: surface three resolution modes in README
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 09:28:44 +03:00
Razvan Dimescu
f3ca83246c chore: bump version to 0.8.0
Breaking: default mode changed from auto to forward.
New: memory footprint stats + dashboard panel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 09:11:34 +03:00
Razvan Dimescu
da93a3cde3 feat: add memory footprint to /stats and dashboard (#26)
* feat: add memory footprint to /stats and dashboard

Per-structure heap estimation (cache, blocklist, query log, SRTT,
overrides) with process RSS via mach_task_basic_info / sysconf.
Dashboard gets a 6th stat card and a sidebar breakdown panel with
stacked bar visualization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use phys_footprint on macOS to match Activity Monitor

Switch from MACH_TASK_BASIC_INFO (resident_size) to TASK_VM_INFO
(phys_footprint) which matches Activity Monitor's Memory column.
Also: capacity-aware heap estimation, entry counts in memory payload,
heap_bytes tests for all stores.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove redundant fields and fix naming in memory stats

Remove duplicate entry counts from MemoryStats (already in parent
StatsResponse), rename process_rss_bytes to process_memory_bytes
to match macOS phys_footprint semantics, drop restating comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 09:09:44 +03:00
Razvan Dimescu
98da440c84 feat: forward-by-default, auto recursive mode, Linux install fixes (#27)
* feat: auto recursive mode, fix Linux install

Auto mode (new default): probes a root server on startup; uses
recursive resolution if outbound DNS works, falls back to Quad9 DoH
if blocked. Dashboard shows mode indicator (green/yellow).

Linux install fixes:
- Add DNSStubListener=no to resolved drop-in (frees port 53)
- Configure DNS before starting service (correct ordering)
- Skip 127.0.0.53 in upstream detection
- `numa install` now does everything (service + DNS + CA)
- `numa uninstall` mirrors install (stop service + restore DNS)
- Extract is_loopback_or_stub() for consistent filtering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: enable DNSSEC validation by default

With recursive as the default mode, DNSSEC validation completes the
trustless resolution chain. Strict mode remains off by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: forward search domains to VPC resolver on Linux

Parse search/domain lines from resolv.conf and create conditional
forwarding rules to the original nameserver or AWS VPC resolver
(169.254.169.253). Fixes internal hostname resolution on cloud VMs
where recursive mode can't resolve private DNS zones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: single-pass resolv.conf parsing, eliminate redundancies

Parse resolv.conf once for both upstream and search domains instead
of 2-3 reads. Extract CLOUD_VPC_RESOLVER constant. Use &'static str
for mode in StatsResponse. Remove dead read_upstream_from_file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: macOS install health check, harden recursive probe

Verify numa is listening (API port) before redirecting system DNS on
macOS — if the service fails to start (e.g. port 53 in use), unload
the service and abort instead of breaking DNS. Probe up to 3 root
hints before declaring recursive mode unavailable. Validate IPs from
resolvectl to avoid IPv6 fragment extraction. Extract DEFAULT_API_PORT
constant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: widen make_rule cfg gate to include Linux

make_rule was gated to macOS-only but discover_linux() calls it for
search domain forwarding rules. CI failed on Linux with E0425.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: forward mode as default, recursive opt-in

Forward mode (transparent proxy to system DNS) is now the default.
Recursive and auto modes are explicit opt-in via config. This avoids
bypassing corporate DNS policies, captive portals, VPC private zones,
and parental controls on first install.

- Move #[default] from Auto to Forward on UpstreamMode
- DNSSEC defaults to off (no-op in forward mode)
- 3-way match in main: Forward/Recursive/Auto with clean separation
- Post-install message suggests mode = "recursive" for sovereignty
- Update README, site, and launch drafts messaging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 08:49:16 +03:00
Razvan Dimescu
4e5b88496c fix: include recursive and coalesced queries in cache hit rate denominator (#24)
The cache hit rate was computed as cached/(cached+forwarded+local+overridden),
excluding recursive and coalesced queries from the denominator. This inflated
the displayed rate (e.g. 57.9%) far above the actual cache proportion (20.9%).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 00:17:40 +03:00
Razvan Dimescu
d5f7ce9e2d chore: updated install methods 2026-03-29 23:33:45 +03:00
Razvan Dimescu
cc704be590 chore: bump version to 0.7.3 2026-03-29 23:16:46 +03:00
Razvan Dimescu
ff1200eb10 feat: resolve .numa services to LAN IP for remote clients (#23)
* feat: resolve .numa services to LAN IP for remote clients

Remote DNS clients (e.g. phones on same WiFi) received 127.0.0.1 for
local .numa services, which is unreachable from their perspective.
Now returns the host's LAN IP when the query originates from a
non-loopback address. Also auto-widens proxy bind to 0.0.0.0 when
DNS is already public, and adds a startup warning when the proxy
remains localhost-only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: respect proxy bind_addr config, don't auto-widen

The auto-widen silently overrode an explicit config value — the user's
config should be the source of truth. Now the proxy always uses the
configured bind_addr, and the warning fires whenever it's 127.0.0.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update proxy bind_addr comment in example config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:15:42 +03:00
Razvan Dimescu
49535568d9 refactor: deduplicate query builders, record extraction, sinkhole records (#22)
- Add DnsPacket::query(id, domain, qtype) constructor; replace mock_query,
  make_query, and 4 inline constructions across ctx/forward/recursive/api
- Add record_to_addr() in recursive.rs; replace 4 identical A/AAAA match
  blocks with filter_map one-liners
- Add sinkhole_record() in ctx.rs; consolidate localhost and blocklist
  A/AAAA branching into single calls
- Remove now-unused DnsQuestion imports

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 14:22:07 +03:00
Razvan Dimescu
cd1beedf38 chore: bump version to 0.7.2 2026-03-29 11:44:10 +03:00
Razvan Dimescu
be52e5c305 docs: streamline README for clarity and scannability
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 11:42:08 +03:00
Razvan Dimescu
669498e85f refactor: extract resolve_coalesced, test real code (#21)
* refactor: extract resolve_coalesced, rewrite tests against real code

Extract Disposition enum, acquire_inflight(), and resolve_coalesced()
from handle_query so coalescing logic is independently testable. Rewrite
integration tests to call resolve_coalesced directly with mock futures
instead of fighting the iterative resolver's NS chain. All 12 coalescing
tests now exercise production code paths, not tokio primitives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: SERVFAIL echoes question section, preserve error messages

resolve_coalesced now takes &DnsPacket instead of query_id so SERVFAIL
responses use response_from (echoing question section per RFC). Error
messages preserved via Option<String> return for upstream error logging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 11:14:25 +03:00
Razvan Dimescu
d325b92e44 chore: bump version to 0.7.1 2026-03-29 10:39:17 +03:00
Razvan Dimescu
261fd2e148 blog: add DNSSEC chain-of-trust SVG diagram
Replace text-based chain trace with a visual diagram showing the
verification flow from cloudflare.com through .com TLD to root
trust anchor. Matches site color palette and typography.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 10:38:47 +03:00
Razvan Dimescu
30e46e549c feat: in-flight query coalescing with COALESCED path (#20)
* feat: in-flight query coalescing for recursive resolver

When multiple queries for the same (domain, qtype) arrive concurrently
and all miss the cache, only the first triggers recursive resolution.
Subsequent queries wait on a broadcast channel for the result.

Prevents thundering herd where N concurrent cache misses each
independently walk the full NS chain, compounding timeouts.

Uses InflightGuard (Drop impl) to guarantee map cleanup on
panic/cancellation — prevents permanent SERVFAIL poisoning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: add InflightMap type alias for clippy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add COALESCED query path and coalescing tests

Followers in the inflight coalescing path now log as COALESCED instead
of RECURSIVE, making it visible in the dashboard when queries were
deduplicated vs independently resolved. Adds 10 tests covering
InflightGuard cleanup, broadcast mechanics, and concurrent handle_query
coalescing through a mock TCP DNS server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract acquire_inflight, rewrite tests against real code

Move Disposition enum and inflight acquisition logic into a standalone
acquire_inflight() function. Rewrite 4 tests that were exercising tokio
primitives to call the real coalescing code path instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 10:36:02 +03:00
Razvan Dimescu
ac49658c2b chore: add release script and make target
Usage: make release VERSION=0.8.0
Bumps Cargo.toml + Cargo.lock, commits, tags, pushes — triggers
the existing GitHub Actions release workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:33:58 +03:00
Razvan Dimescu
5265f571d0 chore: update Cargo.lock for 0.7.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:22:32 +03:00
Razvan Dimescu
0ebd924825 chore: bump version to 0.7.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:16:26 +03:00
Razvan Dimescu
06d4e91cd2 feat: SRTT-based nameserver selection (#19)
* feat: SRTT-based nameserver selection for recursive resolver

BIND-style Smoothed RTT (EWMA) tracking per NS IP address. The resolver
learns which nameservers respond fastest and prefers them, eliminating
cascading timeouts from slow/unreachable IPv6 servers.

- New src/srtt.rs: SrttCache with record_rtt, record_failure, sort_by_rtt
- EWMA formula: new = (old * 7 + sample) / 8, 5s failure penalty, 5min decay
- TCP penalty (+100ms) lets SRTT naturally deprioritize IPv6-over-TCP
- Enabled flag embedded in SrttCache (no-op when disabled)
- Batch eviction (64 entries) for O(1) amortized writes at capacity
- Configurable via [upstream] srtt = true/false (default: true)
- Benchmark script: scripts/benchmark.sh (full, cold, warm, compare-all)
- Benchmarks show 12x avg improvement, 0% queries >1s (was 58%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: show DNSSEC and SRTT status in dashboard + API

Add dnssec and srtt boolean fields to /stats API response.
Display on/off indicators in the dashboard footer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply SRTT decay before EWMA so recovered servers rehabilitate

Without decay-before-EWMA, a server penalized at 5000ms stayed near
that value even after recovery — the stale raw penalty was used as the
EWMA base instead of the decayed estimate. Extract decayed_srtt()
helper and call it in record_rtt() before the smoothing step.

Also restores removed "why" comments in send_query / resolve_recursive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add install/upgrade instructions, smarter benchmark priming

README: document `numa install`, `numa service`, Homebrew upgrade,
and `make deploy` workflows. Benchmark: replace fixed `sleep 4` with
`wait_for_priming` that polls cache entry count for stability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 23:22:31 +02:00
Razvan Dimescu
71dbb138bc fix: return NXDOMAIN for .local queries instead of SERVFAIL (#18)
.local is reserved for mDNS (RFC 6762) and cannot be resolved by
upstream DNS servers. Add it to is_special_use_domain() so queries
like _grpc_config.localhost.local get an immediate NXDOMAIN instead
of timing out and returning SERVFAIL.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 22:42:33 +02:00
Razvan Dimescu
fbf3ca6d11 chore: bump version to 0.6.0
Recursive DNS resolution, full DNSSEC validation, TCP fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 04:12:28 +02:00
Razvan Dimescu
a84f2e7f1d feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation

Numa becomes a true DNS resolver — resolves from root nameservers
with complete DNSSEC chain-of-trust verification.

Recursive resolution:
- Iterative RFC 1034 from configurable root hints (13 default)
- CNAME chasing (depth 8), referral following (depth 10)
- A+AAAA glue extraction, IPv6 nameserver support
- TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs
- Config: mode = "recursive" in [upstream], root_hints, prime_tlds

DNSSEC (all 4 phases):
- EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020)
- DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write
- Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519
- Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326)
- DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK)
- RRSIG expiration/inception time validation
- NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial
- NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range
- Authority RRSIG verification for denial proofs
- Config: [dnssec] enabled/strict (default false, opt-in)
- AD bit on Secure, SERVFAIL on Bogus+strict
- DnssecStatus cached per entry, ValidationStats logging

Performance:
- TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY)
- Referral DS piggybacking from authority sections
- DNSKEY prefetch before validation loop
- Cold-cache validation: ~1 DNSKEY fetch (down from 5)
- Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns

Also:
- write_qname fix for root domain "." (was producing malformed queries)
- write_record_header() dedup, write_bytes() bulk writes
- DnsRecord::domain() + query_type() accessors
- UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const
- Real glue TTL (was hardcoded 3600)
- DNSSEC restricted to recursive mode only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: TCP fallback, query minimization, UDP auto-disable

Transport resilience for restrictive networks (ISPs blocking UDP:53):
- DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry
- UDP auto-disable: after 3 consecutive failures, switch to TCP-first
- IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6)
- Network change resets UDP detection for re-probing
- Root hint rotation in TLD priming

Privacy:
- RFC 7816 query minimization: root servers see TLD only, not full name

Code quality:
- Merged find_starting_ns + find_starting_zone → find_closest_ns
- Extracted resolve_ns_addrs_from_glue shared helper
- Removed overall timeout wrapper (per-hop timeouts sufficient)
- forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2)

Testing:
- Mock TCP-only DNS server for fallback tests (no network needed)
- tcp_fallback_resolves_when_udp_blocked
- tcp_only_iterative_resolution
- tcp_fallback_handles_nxdomain
- udp_auto_disable_resets
- Integration test suite (4 suites, 51 tests)
- Network probe script (tests/network-probe.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DNSSEC verified badge in dashboard query log

- Add dnssec field to QueryLogEntry, track validation status per query
- DnssecStatus::as_str() for API serialization
- Dashboard shows green checkmark next to DNSSEC-verified responses
- Blog post: add "How keys get there" section, transport resilience section,
  trim code blocks, update What's Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SVG shield for DNSSEC badge, update blog HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: NS cache lookup from authorities, UDP re-probe, shield alignment

- find_closest_ns checks authorities (not just answers) for NS records,
  fixing TLD priming cache misses that caused redundant root queries
- Periodic UDP re-probe every 5min when disabled — re-enables UDP
  after switching from a restrictive network to an open one
- Dashboard DNSSEC shield uses fixed-width container for alignment
- Blog post: tuck key-tag into trust anchor paragraph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TCP single-write, mock server consistency, integration tests

- TCP single-write fix: combine length prefix + message to avoid split
  segments that Microsoft/Azure DNS servers reject
- Mock server (spawn_tcp_dns_server) updated to use single-write too
- Tests: forward_tcp_wire_format, forward_tcp_single_segment_write
- Integration: real-server checks for Microsoft/Office/Azure domains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: recursive bar in dashboard, special-use domain interception

Dashboard:
- Add Recursive bar to resolution paths chart (cyan, distinct from Override)
- Add RECURSIVE path tag style in query log

Special-use domains (RFC 6761/6303/8880/9462):
- .localhost → 127.0.0.1 (RFC 6761)
- Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN
- _dns.resolver.arpa (DDR) → NXDOMAIN
- ipv4only.arpa (NAT64) → 192.0.0.170/171
- mDNS service discovery for private ranges → NXDOMAIN

Eliminates ~900ms SERVFAILs for macOS system queries that were
hitting root servers unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move generated blog HTML to site/blog/posts/, gitignore

- Generated HTML now in site/blog/posts/ (gitignored)
- CI workflow runs pandoc + make blog before deploy
- Updated all internal blog links to /blog/posts/ path
- blog/*.md remains the source of truth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — memory ordering, RRSIG time, NS resolution

- Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES
  (ARM correctness for cross-thread coordination)
- RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5)
  + 300s clock skew fudge factor (matches BIND)
- resolve_ns_addrs_from_glue collects addresses from ALL NS names,
  not just the first with glue (improves failover)
- is_special_use_domain: eliminate 16 format! allocations per
  .in-addr.arpa query (parse octet instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: API endpoint tests, coverage target

- 8 new axum handler tests: health, stats, query-log, overrides CRUD,
  cache, blocking stats, services CRUD, dashboard HTML
- Tests use tower::oneshot — no network, no server startup
- test_ctx() builds minimal ServerCtx for isolated testing
- `make coverage` target (cargo-tarpaulin), separate from `make all`
- 82 total tests (was 74)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 04:03:47 +02:00
Razvan Dimescu
7aee90c99b add Dev.to cover image (dashboard screenshot 1000x420)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 03:20:28 +02:00
Razvan Dimescu
1304b1c02c docs: update README — add numa.rs link, benchmarks, Windows support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:28:38 +02:00