perf: BytePacketBuffer::new() zero-initializes 4 KB on every allocation #56
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
`BytePacketBuffer::new()` at `src/buffer.rs:17-22` materializes a zero-filled `[u8; 4096]` on every call:
```rust
pub fn new() -> BytePacketBuffer {
BytePacketBuffer {
buf: [0; BUF_SIZE],
pos: 0,
}
}
```
Stack-allocated, so no heap cost — but the zero-fill is a real cost on short-lived uses, and several paths create a fresh buffer per call:
Impact estimate
Review of PR #54 (code review finding) estimated the zero-fill at `~60-100 cycles` on modern x86 (via `rep stosb`), which is ~2x the cost of the old `name_to_wire` fast path that did a single `alloc(len+2)` + 13 stores for a plain `example.com`. On `bench_name_to_wire` this likely shows up as a 1.8-2.2x regression on the short-name case introduced by PR #54.
The write amplification ratio for a typical DNS name is roughly `4096 / 30` = ~135x more bytes zero-filled than actually used.
Fix sketch
Add a `new_uninit` constructor backed by `MaybeUninit<[u8; 4096]>`:
```rust
pub fn new_uninit() -> BytePacketBuffer {
use std::mem::MaybeUninit;
let buf: [MaybeUninit; BUF_SIZE] = unsafe { MaybeUninit::uninit().assume_init() };
// SAFETY: BytePacketBuffer invariants: only bytes in 0..pos are ever
// observed via filled() or get_range(0..pos); anything past pos is
// undefined and must never be read. write_u8() and write_bytes()
// initialize bytes in strict forward order before advancing pos.
BytePacketBuffer {
buf: unsafe { std::mem::transmute::<_, [u8; BUF_SIZE]>(buf) },
pos: 0,
}
}
```
Or, cleaner with stable Rust, use `Box<[u8; BUF_SIZE]>` with `alloc_zeroed` replaced by `alloc` via `std::alloc::Layout` — but the MaybeUninit transmute is the zero-churn path.
What to audit before applying
Every reader API on `BytePacketBuffer`:
Might be worth auditing `read_qname`'s pointer jumps to confirm they can't jump beyond `self.pos`.
Alternative
Thread-local scratch buffer reused across calls:
```rust
thread_local! {
static SCRATCH: RefCell = RefCell::new(BytePacketBuffer::new());
}
```
Pros: zero per-call zero-fill after the first. No unsafe.
Cons: reset discipline (`buf.pos = 0` before each use) and borrow checker friction at call sites.
Discovered during
Code review of PR #54 (RFC 1035 label escaping). Not expanded in that PR to keep scope disciplined — it's a perf optimization that affects every `BytePacketBuffer` caller, not just `name_to_wire`.