perf: BytePacketBuffer::new() zero-initializes 4 KB on every allocation #56

Open
opened 2026-04-10 05:23:07 +08:00 by razvandimescu · 0 comments
razvandimescu commented 2026-04-10 05:23:07 +08:00 (Migrated from github.com)

Context

`BytePacketBuffer::new()` at `src/buffer.rs:17-22` materializes a zero-filled `[u8; 4096]` on every call:

```rust
pub fn new() -> BytePacketBuffer {
BytePacketBuffer {
buf: [0; BUF_SIZE],
pos: 0,
}
}
```

Stack-allocated, so no heap cost — but the zero-fill is a real cost on short-lived uses, and several paths create a fresh buffer per call:

  • `dnssec::name_to_wire` (one per record in an RRset during RRSIG verification)
  • Any other code that uses `BytePacketBuffer` as a scratch buffer for label/packet serialization

Impact estimate

Review of PR #54 (code review finding) estimated the zero-fill at `~60-100 cycles` on modern x86 (via `rep stosb`), which is ~2x the cost of the old `name_to_wire` fast path that did a single `alloc(len+2)` + 13 stores for a plain `example.com`. On `bench_name_to_wire` this likely shows up as a 1.8-2.2x regression on the short-name case introduced by PR #54.

The write amplification ratio for a typical DNS name is roughly `4096 / 30` = ~135x more bytes zero-filled than actually used.

Fix sketch

Add a `new_uninit` constructor backed by `MaybeUninit<[u8; 4096]>`:

```rust
pub fn new_uninit() -> BytePacketBuffer {
use std::mem::MaybeUninit;
let buf: [MaybeUninit; BUF_SIZE] = unsafe { MaybeUninit::uninit().assume_init() };
// SAFETY: BytePacketBuffer invariants: only bytes in 0..pos are ever
// observed via filled() or get_range(0..pos); anything past pos is
// undefined and must never be read. write_u8() and write_bytes()
// initialize bytes in strict forward order before advancing pos.
BytePacketBuffer {
buf: unsafe { std::mem::transmute::<_, [u8; BUF_SIZE]>(buf) },
pos: 0,
}
}
```

Or, cleaner with stable Rust, use `Box<[u8; BUF_SIZE]>` with `alloc_zeroed` replaced by `alloc` via `std::alloc::Layout` — but the MaybeUninit transmute is the zero-churn path.

What to audit before applying

Every reader API on `BytePacketBuffer`:

  • `get(pos)` — callers must guarantee `pos < self.pos` (currently only bounds-checks against `BUF_SIZE`, so it'd happily read uninit bytes)
  • `get_range(start, len)` — same
  • `filled()` — already returns only `&buf[..pos]`, safe
  • `read()` — reads at `self.pos` and advances; only called after seeking into a previously-written region, so safe by invariant but worth documenting

Might be worth auditing `read_qname`'s pointer jumps to confirm they can't jump beyond `self.pos`.

Alternative

Thread-local scratch buffer reused across calls:

```rust
thread_local! {
static SCRATCH: RefCell = RefCell::new(BytePacketBuffer::new());
}
```

Pros: zero per-call zero-fill after the first. No unsafe.
Cons: reset discipline (`buf.pos = 0` before each use) and borrow checker friction at call sites.

Discovered during

Code review of PR #54 (RFC 1035 label escaping). Not expanded in that PR to keep scope disciplined — it's a perf optimization that affects every `BytePacketBuffer` caller, not just `name_to_wire`.

## Context \`BytePacketBuffer::new()\` at \`src/buffer.rs:17-22\` materializes a zero-filled \`[u8; 4096]\` on every call: \`\`\`rust pub fn new() -> BytePacketBuffer { BytePacketBuffer { buf: [0; BUF_SIZE], pos: 0, } } \`\`\` Stack-allocated, so no heap cost — but the zero-fill is a real cost on short-lived uses, and several paths create a fresh buffer per call: - \`dnssec::name_to_wire\` (one per record in an RRset during RRSIG verification) - Any other code that uses \`BytePacketBuffer\` as a scratch buffer for label/packet serialization ## Impact estimate Review of PR #54 ([code review finding](https://github.com/razvandimescu/numa/pull/54)) estimated the zero-fill at \`~60-100 cycles\` on modern x86 (via \`rep stosb\`), which is ~**2x** the cost of the old \`name_to_wire\` fast path that did a single \`alloc(len+2)\` + 13 stores for a plain \`example.com\`. On \`bench_name_to_wire\` this likely shows up as a 1.8-2.2x regression on the short-name case introduced by PR #54. The write amplification ratio for a typical DNS name is roughly \`4096 / 30\` = **~135x** more bytes zero-filled than actually used. ## Fix sketch Add a \`new_uninit\` constructor backed by \`MaybeUninit<[u8; 4096]>\`: \`\`\`rust pub fn new_uninit() -> BytePacketBuffer { use std::mem::MaybeUninit; let buf: [MaybeUninit<u8>; BUF_SIZE] = unsafe { MaybeUninit::uninit().assume_init() }; // SAFETY: BytePacketBuffer invariants: only bytes in 0..pos are ever // observed via filled() or get_range(0..pos); anything past pos is // undefined and must never be read. write_u8() and write_bytes() // initialize bytes in strict forward order before advancing pos. BytePacketBuffer { buf: unsafe { std::mem::transmute::<_, [u8; BUF_SIZE]>(buf) }, pos: 0, } } \`\`\` Or, cleaner with stable Rust, use \`Box<[u8; BUF_SIZE]>\` with \`alloc_zeroed\` replaced by \`alloc\` via \`std::alloc::Layout\` — but the MaybeUninit transmute is the zero-churn path. ## What to audit before applying Every reader API on \`BytePacketBuffer\`: - \`get(pos)\` — callers must guarantee \`pos < self.pos\` (currently only bounds-checks against \`BUF_SIZE\`, so it'd happily read uninit bytes) - \`get_range(start, len)\` — same - \`filled()\` — already returns only \`&buf[..pos]\`, safe - \`read()\` — reads at \`self.pos\` and advances; only called after seeking into a previously-written region, so safe by invariant but worth documenting Might be worth auditing \`read_qname\`'s pointer jumps to confirm they can't jump beyond \`self.pos\`. ## Alternative Thread-local scratch buffer reused across calls: \`\`\`rust thread_local! { static SCRATCH: RefCell<BytePacketBuffer> = RefCell::new(BytePacketBuffer::new()); } \`\`\` Pros: zero per-call zero-fill after the first. No unsafe. Cons: reset discipline (\`buf.pos = 0\` before each use) and borrow checker friction at call sites. ## Discovered during Code review of PR #54 (RFC 1035 label escaping). Not expanded in that PR to keep scope disciplined — it's a perf optimization that affects every \`BytePacketBuffer\` caller, not just \`name_to_wire\`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dearsky/numa#56