perf: optimize DNS query hot path (#15)

* perf: optimize hot path — RwLock, inline filtering, pre-allocated strings

- Mutex → RwLock for cache, blocklist, and overrides (concurrent read access)
- Make cache.lookup() and overrides.lookup() take &self (read-only)
- Eliminate 3 Vec allocations per DnsPacket::write() via inline filtering
- Pre-allocate domain strings with capacity 64 in parse path
- Add criterion micro-benchmarks (hot_path + throughput)
- Add bench README documenting both benchmark suites

Measured improvement: ~14% faster parsing, ~9% pipeline throughput,
round-trip cached 733ns → 698ns (~2.3M queries/sec).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: simplify benchmark code after review

- Remove redundant DnsHeader::new() (already set by DnsPacket::new())
- Remove unused DnsHeader import
- Change simulate_cached_pipeline to take &DnsCache (lookup is &self now)
- Remove unnecessary mut on cache in cache_lookup_miss bench

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Razvan Dimescu
2026-03-27 02:01:08 +02:00
committed by GitHub
parent 1f4063d5db
commit 962b400f4c
13 changed files with 728 additions and 77 deletions

View File

@@ -1,5 +1,5 @@
use std::net::SocketAddr;
use std::sync::{Arc, Mutex};
use std::sync::{Arc, Mutex, RwLock};
use std::time::Duration;
use arc_swap::ArcSwap;
@@ -170,14 +170,14 @@ async fn main() -> numa::Result<()> {
let ctx = Arc::new(ServerCtx {
socket: UdpSocket::bind(&config.server.bind_addr).await?,
zone_map: build_zone_map(&config.zones)?,
cache: Mutex::new(DnsCache::new(
cache: RwLock::new(DnsCache::new(
config.cache.max_entries,
config.cache.min_ttl,
config.cache.max_ttl,
)),
stats: Mutex::new(ServerStats::new()),
overrides: Mutex::new(OverrideStore::new()),
blocklist: Mutex::new(blocklist),
overrides: RwLock::new(OverrideStore::new()),
blocklist: RwLock::new(blocklist),
query_log: Mutex::new(QueryLog::new(1000)),
services: Mutex::new(service_store),
lan_peers: Mutex::new(numa::lan::PeerStore::new(config.lan.peer_timeout_secs)),
@@ -541,7 +541,7 @@ async fn load_blocklists(ctx: &ServerCtx, lists: &[String]) {
// Swap under lock — sub-microsecond
ctx.blocklist
.lock()
.write()
.unwrap()
.swap_domains(all_domains, sources);
info!(