* perf: optimize hot path — RwLock, inline filtering, pre-allocated strings - Mutex → RwLock for cache, blocklist, and overrides (concurrent read access) - Make cache.lookup() and overrides.lookup() take &self (read-only) - Eliminate 3 Vec allocations per DnsPacket::write() via inline filtering - Pre-allocate domain strings with capacity 64 in parse path - Add criterion micro-benchmarks (hot_path + throughput) - Add bench README documenting both benchmark suites Measured improvement: ~14% faster parsing, ~9% pipeline throughput, round-trip cached 733ns → 698ns (~2.3M queries/sec). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: simplify benchmark code after review - Remove redundant DnsHeader::new() (already set by DnsPacket::new()) - Remove unused DnsHeader import - Change simulate_cached_pipeline to take &DnsCache (lookup is &self now) - Remove unnecessary mut on cache in cache_lookup_miss bench Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * site: landing page overhaul, blog, benchmarks, numa.rs domain Landing page: - Split features into 3-layer card layout (Block & Protect, Developer Tools, Self-Sovereign DNS) - Add DoH and conditional forwarding to comparison table - Fix performance claim (2.3M → 2.0M qps to match benchmarks) - Add all 3 install methods (brew, cargo, curl) - Add OG tags + canonical URL for numa.rs - Fix code block whitespace rendering - Update roadmap with .onion bridge phase Blog: - Add "Building a DNS Resolver from Scratch in Rust" post - Blog index + template for future posts Other: - CNAME for GitHub Pages (numa.rs) - Benchmark results (bench/results.json) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: self-host fonts, styled block page, wildcard TLS Fonts: - Replace Google Fonts CDN with self-hosted woff2 (73KB, 5 files) - Serve fonts from API server via include_bytes! (dashboard works offline) - Proxy error pages use system fonts (zero external deps when DNS is broken) - Fix Instrument Serif font-weight: use 400 (only available weight) instead of synthetic bold 600/700 Proxy: - Styled "Blocked by Numa" page when blocked domain hits the proxy (was confusing "not a .numa domain" error) - Extract shared error_page() template for 403 + 404 pages (deduplicate ~160 lines of CSS) TLS: - Add wildcard SAN *.numa to cert — unregistered .numa domains get valid HTTPS (styled 404 without cert warning) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmarks
Numa has two benchmark suites measuring different layers of performance.
Micro-benchmarks (benches/, criterion)
Nanosecond-precision measurement of individual operations on the hot path. No running server required — these are pure Rust unit-level benchmarks.
cargo bench # run all
cargo bench --bench hot_path # parse, serialize, cache, clone
cargo bench --bench throughput # pipeline QPS, buffer alloc
What's measured
hot_path — individual operations:
| Benchmark | What it measures |
|---|---|
buffer_parse |
Wire bytes → DnsPacket (typical response with 4 records) |
buffer_serialize |
DnsPacket → wire bytes |
packet_clone |
Full DnsPacket clone (what cache hit costs) |
cache_lookup_hit |
Cache lookup on a single-entry cache |
cache_lookup_hit_populated |
Cache lookup with 1000 entries |
cache_lookup_miss |
HashMap miss (baseline) |
cache_insert |
Insert into cache with packet clone |
round_trip_cached |
Full cached path: parse query → cache hit → serialize response |
throughput — pipeline capacity:
| Benchmark | What it measures |
|---|---|
pipeline_throughput/N |
N cached queries end-to-end (parse → lookup → serialize) |
buffer_alloc |
BytePacketBuffer 4KB zero-init cost |
Reading results
Criterion auto-compares against the previous run:
round_trip_cached time: [710.5 ns 715.2 ns 720.1 ns]
change: [-2.48% -1.85% -1.21%] (p = 0.00 < 0.05)
Performance has improved.
- The three values are [lower bound, estimate, upper bound] of the mean
changeshows the delta vs the last saved baseline- HTML reports with charts:
target/criterion/report/index.html
To save a named baseline for comparison:
cargo bench -- --save-baseline before
# ... make changes ...
cargo bench -- --baseline before
End-to-end benchmark (bench/dns-bench.sh)
Real-world latency comparison using dig against a running Numa instance
and public resolvers. Measures millisecond-level latency including network I/O.
# Start Numa first (default port 15353 for testing)
python3 bench/dns-bench.sh [port] [rounds]
python3 bench/dns-bench.sh 15353 20 # default
What's measured
- Numa (cold): cache flushed before each query — measures upstream forwarding
- Numa (cached): queries hit cache — measures local processing
- System / Google / Cloudflare / Quad9: public resolver comparison
Results saved to bench/results.json.
When to use which
| Question | Use |
|---|---|
| Did my code change make parsing faster? | cargo bench --bench hot_path |
| Is the cached path still sub-microsecond? | cargo bench --bench hot_path (round_trip_cached) |
| How many queries/sec can we handle? | cargo bench --bench throughput |
| Is Numa still competitive with system resolver? | bench/dns-bench.sh |
| Did upstream forwarding regress? | bench/dns-bench.sh |