Files

Razvan Dimescu 962b400f4c perf: optimize DNS query hot path (#15 )

* perf: optimize hot path — RwLock, inline filtering, pre-allocated strings

- Mutex → RwLock for cache, blocklist, and overrides (concurrent read access)
- Make cache.lookup() and overrides.lookup() take &self (read-only)
- Eliminate 3 Vec allocations per DnsPacket::write() via inline filtering
- Pre-allocate domain strings with capacity 64 in parse path
- Add criterion micro-benchmarks (hot_path + throughput)
- Add bench README documenting both benchmark suites

Measured improvement: ~14% faster parsing, ~9% pipeline throughput,
round-trip cached 733ns → 698ns (~2.3M queries/sec).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: simplify benchmark code after review

- Remove redundant DnsHeader::new() (already set by DnsPacket::new())
- Remove unused DnsHeader import
- Change simulate_cached_pipeline to take &DnsCache (lookup is &self now)
- Remove unnecessary mut on cache in cache_lookup_miss bench

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-27 02:01:08 +02:00

3.0 KiB

Raw Blame History

Benchmarks

Numa has two benchmark suites measuring different layers of performance.

Micro-benchmarks (`benches/`, criterion)

Nanosecond-precision measurement of individual operations on the hot path. No running server required — these are pure Rust unit-level benchmarks.

cargo bench            # run all
cargo bench --bench hot_path      # parse, serialize, cache, clone
cargo bench --bench throughput    # pipeline QPS, buffer alloc

What's measured

hot_path — individual operations:

Benchmark	What it measures
`buffer_parse`	Wire bytes → DnsPacket (typical response with 4 records)
`buffer_serialize`	DnsPacket → wire bytes
`packet_clone`	Full DnsPacket clone (what cache hit costs)
`cache_lookup_hit`	Cache lookup on a single-entry cache
`cache_lookup_hit_populated`	Cache lookup with 1000 entries
`cache_lookup_miss`	HashMap miss (baseline)
`cache_insert`	Insert into cache with packet clone
`round_trip_cached`	Full cached path: parse query → cache hit → serialize response

throughput — pipeline capacity:

Benchmark	What it measures
`pipeline_throughput/N`	N cached queries end-to-end (parse → lookup → serialize)
`buffer_alloc`	BytePacketBuffer 4KB zero-init cost

Reading results

Criterion auto-compares against the previous run:

round_trip_cached  time: [710.5 ns 715.2 ns 720.1 ns]
                   change: [-2.48% -1.85% -1.21%] (p = 0.00 < 0.05)
                   Performance has improved.

The three values are [lower bound, estimate, upper bound] of the mean
change shows the delta vs the last saved baseline
HTML reports with charts: target/criterion/report/index.html

To save a named baseline for comparison:

cargo bench -- --save-baseline before
# ... make changes ...
cargo bench -- --baseline before

End-to-end benchmark (`bench/dns-bench.sh`)

Real-world latency comparison using dig against a running Numa instance and public resolvers. Measures millisecond-level latency including network I/O.

# Start Numa first (default port 15353 for testing)
python3 bench/dns-bench.sh [port] [rounds]
python3 bench/dns-bench.sh 15353 20    # default

What's measured

Numa (cold): cache flushed before each query — measures upstream forwarding
Numa (cached): queries hit cache — measures local processing
System / Google / Cloudflare / Quad9: public resolver comparison

Results saved to bench/results.json.

When to use which

Question	Use
Did my code change make parsing faster?	`cargo bench --bench hot_path`
Is the cached path still sub-microsecond?	`cargo bench --bench hot_path` (round_trip_cached)
How many queries/sec can we handle?	`cargo bench --bench throughput`
Is Numa still competitive with system resolver?	`bench/dns-bench.sh`
Did upstream forwarding regress?	`bench/dns-bench.sh`

3.0 KiB Raw Blame History

Benchmarks

Micro-benchmarks (benches/, criterion)

What's measured

Reading results

End-to-end benchmark (bench/dns-bench.sh)

What's measured

When to use which

3.0 KiB

Raw Blame History

Micro-benchmarks (`benches/`, criterion)

End-to-end benchmark (`bench/dns-bench.sh`)