Commit Graph

12 Commits

Author SHA1 Message Date
Razvan Dimescu
06d4e91cd2 feat: SRTT-based nameserver selection (#19)
* feat: SRTT-based nameserver selection for recursive resolver

BIND-style Smoothed RTT (EWMA) tracking per NS IP address. The resolver
learns which nameservers respond fastest and prefers them, eliminating
cascading timeouts from slow/unreachable IPv6 servers.

- New src/srtt.rs: SrttCache with record_rtt, record_failure, sort_by_rtt
- EWMA formula: new = (old * 7 + sample) / 8, 5s failure penalty, 5min decay
- TCP penalty (+100ms) lets SRTT naturally deprioritize IPv6-over-TCP
- Enabled flag embedded in SrttCache (no-op when disabled)
- Batch eviction (64 entries) for O(1) amortized writes at capacity
- Configurable via [upstream] srtt = true/false (default: true)
- Benchmark script: scripts/benchmark.sh (full, cold, warm, compare-all)
- Benchmarks show 12x avg improvement, 0% queries >1s (was 58%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: show DNSSEC and SRTT status in dashboard + API

Add dnssec and srtt boolean fields to /stats API response.
Display on/off indicators in the dashboard footer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply SRTT decay before EWMA so recovered servers rehabilitate

Without decay-before-EWMA, a server penalized at 5000ms stayed near
that value even after recovery — the stale raw penalty was used as the
EWMA base instead of the decayed estimate. Extract decayed_srtt()
helper and call it in record_rtt() before the smoothing step.

Also restores removed "why" comments in send_query / resolve_recursive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add install/upgrade instructions, smarter benchmark priming

README: document `numa install`, `numa service`, Homebrew upgrade,
and `make deploy` workflows. Benchmark: replace fixed `sleep 4` with
`wait_for_priming` that polls cache entry count for stability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 23:22:31 +02:00
Razvan Dimescu
a84f2e7f1d feat: recursive DNS + DNSSEC + TCP fallback (#17)
* feat: recursive resolution + full DNSSEC validation

Numa becomes a true DNS resolver — resolves from root nameservers
with complete DNSSEC chain-of-trust verification.

Recursive resolution:
- Iterative RFC 1034 from configurable root hints (13 default)
- CNAME chasing (depth 8), referral following (depth 10)
- A+AAAA glue extraction, IPv6 nameserver support
- TLD priming: NS + DS + DNSKEY for 34 gTLDs + EU ccTLDs
- Config: mode = "recursive" in [upstream], root_hints, prime_tlds

DNSSEC (all 4 phases):
- EDNS0 OPT pseudo-record (DO bit, 1232 payload per DNS Flag Day 2020)
- DNSKEY, DS, RRSIG, NSEC, NSEC3 record types with wire read/write
- Signature verification via ring: RSA/SHA-256, ECDSA P-256, Ed25519
- Chain-of-trust: zone DNSKEY → parent DS → root KSK (key tag 20326)
- DNSKEY RRset self-signature verification (RRSIG(DNSKEY) by KSK)
- RRSIG expiration/inception time validation
- NSEC: NXDOMAIN gap proofs, NODATA type absence, wildcard denial
- NSEC3: SHA-1 iterated hashing, closest encloser proof, hash range
- Authority RRSIG verification for denial proofs
- Config: [dnssec] enabled/strict (default false, opt-in)
- AD bit on Secure, SERVFAIL on Bogus+strict
- DnssecStatus cached per entry, ValidationStats logging

Performance:
- TLD chain pre-warmed on startup (root DNSKEY + TLD DS/DNSKEY)
- Referral DS piggybacking from authority sections
- DNSKEY prefetch before validation loop
- Cold-cache validation: ~1 DNSKEY fetch (down from 5)
- Benchmarks: RSA 10.9µs, ECDSA 174ns, DS verify 257ns

Also:
- write_qname fix for root domain "." (was producing malformed queries)
- write_record_header() dedup, write_bytes() bulk writes
- DnsRecord::domain() + query_type() accessors
- UpstreamMode enum, DEFAULT_EDNS_PAYLOAD const
- Real glue TTL (was hardcoded 3600)
- DNSSEC restricted to recursive mode only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: TCP fallback, query minimization, UDP auto-disable

Transport resilience for restrictive networks (ISPs blocking UDP:53):
- DNS-over-TCP fallback: UDP fail/truncation → automatic TCP retry
- UDP auto-disable: after 3 consecutive failures, switch to TCP-first
- IPv6 → TCP directly (UDP socket binds 0.0.0.0, can't reach IPv6)
- Network change resets UDP detection for re-probing
- Root hint rotation in TLD priming

Privacy:
- RFC 7816 query minimization: root servers see TLD only, not full name

Code quality:
- Merged find_starting_ns + find_starting_zone → find_closest_ns
- Extracted resolve_ns_addrs_from_glue shared helper
- Removed overall timeout wrapper (per-hop timeouts sufficient)
- forward_tcp for DNS-over-TCP (RFC 1035 §4.2.2)

Testing:
- Mock TCP-only DNS server for fallback tests (no network needed)
- tcp_fallback_resolves_when_udp_blocked
- tcp_only_iterative_resolution
- tcp_fallback_handles_nxdomain
- udp_auto_disable_resets
- Integration test suite (4 suites, 51 tests)
- Network probe script (tests/network-probe.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DNSSEC verified badge in dashboard query log

- Add dnssec field to QueryLogEntry, track validation status per query
- DnssecStatus::as_str() for API serialization
- Dashboard shows green checkmark next to DNSSEC-verified responses
- Blog post: add "How keys get there" section, transport resilience section,
  trim code blocks, update What's Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SVG shield for DNSSEC badge, update blog HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: NS cache lookup from authorities, UDP re-probe, shield alignment

- find_closest_ns checks authorities (not just answers) for NS records,
  fixing TLD priming cache misses that caused redundant root queries
- Periodic UDP re-probe every 5min when disabled — re-enables UDP
  after switching from a restrictive network to an open one
- Dashboard DNSSEC shield uses fixed-width container for alignment
- Blog post: tuck key-tag into trust anchor paragraph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TCP single-write, mock server consistency, integration tests

- TCP single-write fix: combine length prefix + message to avoid split
  segments that Microsoft/Azure DNS servers reject
- Mock server (spawn_tcp_dns_server) updated to use single-write too
- Tests: forward_tcp_wire_format, forward_tcp_single_segment_write
- Integration: real-server checks for Microsoft/Office/Azure domains

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: recursive bar in dashboard, special-use domain interception

Dashboard:
- Add Recursive bar to resolution paths chart (cyan, distinct from Override)
- Add RECURSIVE path tag style in query log

Special-use domains (RFC 6761/6303/8880/9462):
- .localhost → 127.0.0.1 (RFC 6761)
- Private reverse PTR (10.x, 192.168.x, 172.16-31.x) → NXDOMAIN
- _dns.resolver.arpa (DDR) → NXDOMAIN
- ipv4only.arpa (NAT64) → 192.0.0.170/171
- mDNS service discovery for private ranges → NXDOMAIN

Eliminates ~900ms SERVFAILs for macOS system queries that were
hitting root servers unnecessarily.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move generated blog HTML to site/blog/posts/, gitignore

- Generated HTML now in site/blog/posts/ (gitignored)
- CI workflow runs pandoc + make blog before deploy
- Updated all internal blog links to /blog/posts/ path
- blog/*.md remains the source of truth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review feedback — memory ordering, RRSIG time, NS resolution

- Ordering::Relaxed → Acquire/Release for UDP_DISABLED/UDP_FAILURES
  (ARM correctness for cross-thread coordination)
- RRSIG time validation: serial number arithmetic (RFC 4034 §3.1.5)
  + 300s clock skew fudge factor (matches BIND)
- resolve_ns_addrs_from_glue collects addresses from ALL NS names,
  not just the first with glue (improves failover)
- is_special_use_domain: eliminate 16 format! allocations per
  .in-addr.arpa query (parse octet instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: API endpoint tests, coverage target

- 8 new axum handler tests: health, stats, query-log, overrides CRUD,
  cache, blocking stats, services CRUD, dashboard HTML
- Tests use tower::oneshot — no network, no server startup
- test_ctx() builds minimal ServerCtx for isolated testing
- `make coverage` target (cargo-tarpaulin), separate from `make all`
- 82 total tests (was 74)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 04:03:47 +02:00
Razvan Dimescu
c6b35045d8 config visibility, PR review fixes, XSS hardening
Config visibility:
- startup banner shows config path, data dir, services path
- config search: ./numa.toml → ~/.config/numa/ → /usr/local/var/numa/
- /stats API exposes config_path and data_dir, dashboard footer renders them
- GET /ca.pem endpoint serves CA cert for cross-device TLS trust
- load_config returns ConfigLoad with found flag, warns on not-found
- ServerCtx stores PathBuf for config_dir/data_dir, string conversion at boundaries

PR review fixes:
- add explicit parens in resolve_route operator precedence (service_store.rs)
- hostname portability: drop -s flag, trim domain with split('.') (lan.rs)
- serve_ca uses spawn_blocking instead of sync fs::read in async handler
- load_config: remove TOCTOU exists() check, read directly and handle NotFound

XSS hardening:
- HTML-escape all user-controlled interpolations in dashboard (service names,
  route paths, ports, URLs, block check domain/reason)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 12:24:21 +02:00
Razvan Dimescu
64c4d146ec add unit tests for route matching, config defaults, and service store
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 07:49:22 +02:00
Razvan Dimescu
5e5a6544bc LAN opt-in, mDNS migration, security hardening, path-based routing
- LAN discovery disabled by default (opt-in via [lan] enabled = true)
- Replace custom JSON multicast (239.255.70.78:5390) with standard mDNS
  (_numa._tcp.local on 224.0.0.251:5353) using existing DNS parser
- Instance ID in TXT record for multi-instance self-filtering
- API and proxy bind to 127.0.0.1 by default (0.0.0.0 when LAN enabled)
- Path-based routing: longest prefix match with optional prefix stripping
  via [[services]] routes = [{path, port, strip?}]
- REST API: GET/POST/DELETE /services/{name}/routes
- Dashboard shows route lines per service when configured
- Segment-boundary route matching (prevents /api matching /apiary)
- Route path validation (rejects path traversal)

Closes #11

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 06:56:31 +02:00
Razvan Dimescu
c9f1d98f45 add LAN service discovery via UDP multicast
Numa instances on the same network auto-discover each other's .numa
services. No config, no cloud — just multicast on 239.255.70.78:5390.

- PeerStore with lazy expiry (90s timeout, 30s broadcast interval)
- DNS resolves remote .numa services to peer's LAN IP (not localhost)
- Proxy forwards to peer IP for remote services
- Graceful degradation if multicast bind fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 16:45:46 +02:00
Razvan Dimescu
3bfcd827ac add TLS, service persistence, blocking panel, query types
- Local TLS: auto-generated CA + per-service certs (explicit SANs, not
  wildcards — browsers reject *.numa under single-label TLDs). HTTPS
  proxy on :443 via rustls/tokio-rustls. `numa install` trusts CA in
  macOS Keychain / Linux ca-certificates.
- Service persistence: user-added services saved to
  ~/.config/numa/services.json, survive restarts.
- Blocking panel: renamed "Check Domain" to "Blocking" with sources
  display, allowlist management UI, unpause button.
- Query types: recognize SOA, PTR, TXT, SRV, HTTPS (type 65) instead
  of logging as UNKNOWN.
- Blocklist gzip: reqwest now decompresses gzip responses from CDNs.
- Unified config_dir() in lib.rs for consistent path resolution under
  sudo and launchd. TLS certs use /usr/local/var/numa/ (writable as
  root daemon).
- Dashboard UX: panel subtitles differentiating overrides vs services,
  better placeholders, proxy route display, 600px query log height.
- Deploy: make deploy handles build+copy+codesign+restart cycle.
- Demo: scripts/record-demo.sh for recording hero GIF with CDP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 01:15:07 +02:00
Razvan Dimescu
8f959ce0a5 add local service proxy with .numa domains
HTTP reverse proxy on port 80 lets developers use clean domain names
(frontend.numa, api.numa) instead of localhost:PORT. Includes WebSocket
upgrade support for HMR, TCP health checks, dashboard UI panel, and
REST API for service management. numa.numa is preconfigured for the
dashboard itself.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 15:07:15 +02:00
Razvan Dimescu
ee776938c5 add auto-detect upstream, install script, release workflow
- Default upstream auto-detected from system resolver (scutil/resolv.conf)
  instead of hardcoding Google 8.8.8.8. Falls back to Quad9 (9.9.9.9).
- Single scutil --dns pass for both upstream detection and forwarding rules
- Linux: reads backup resolv.conf if current only has loopback
- Service start/stop now couples DNS config (install on start, uninstall on stop)
- Install script for one-line binary install from GitHub Releases
- GitHub Actions release workflow: builds for macOS/Linux x86_64/aarch64

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 14:14:20 +02:00
Razvan Dimescu
4dc5b94c7a add ad blocking, live dashboard, system DNS auto-discovery
- DNS-level ad blocking: 385K+ domains via Hagezi Pro blocklist, subdomain
  matching, one-click allowlist, pause/toggle, background refresh every 24h
- Live dashboard at :5380 with real-time stats, query log, override
  management (create/edit/delete), blocking controls
- System DNS auto-discovery: parses scutil --dns on macOS to find
  conditional forwarding rules (Tailscale, VPN split-DNS)
- REST API expanded to 18 endpoints (blocking, overrides, diagnostics)
- Startup banner with colored system info
- Performance benchmarks (bench/dns-bench.sh)
- Landing page updated with new positioning and comparison table
- CI, Dockerfile, LICENSE, development plan docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 10:54:23 +02:00
Razvan Dimescu
89e7cbd989 add Makefile with clippy/rustfmt linting, fix all warnings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:04:31 +02:00
Razvan Dimescu
9c71e9bb3f refactor to async tokio with modular architecture
- Replace synchronous std::net::UdpSocket with tokio async runtime
- Spawn concurrent task per incoming DNS query via tokio::spawn
- Extract monolithic main.rs into modules: buffer, header, question,
  record, packet, config, cache, forward, stats
- Share state across tasks via Arc<ServerCtx> with scoped Mutex locks
- Add TOML config loading, TTL-aware cache, structured logging, stats
- Add CLAUDE.md, README, dns_fun.toml config, and design docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:50:16 +02:00