fix: escape DNS label text per RFC 1035 §5.1 (closes #36) #54

Merged
razvandimescu merged 5 commits from fix/qname-label-escaping into main 2026-04-10 13:53:47 +08:00
razvandimescu commented 2026-04-10 04:42:57 +08:00 (Migrated from github.com)

Closes #36.
Closes #55.

Bug

read_qname pushed raw label bytes directly into the output string, producing ambiguous text for labels containing dots, backslashes, or non-printable bytes. fanf2 spotted this on HN:

Wire format [8]exa.mple[3]com[0] (two labels, first containing a literal 0x2E) was rendered as exa.mple.com, indistinguishable from three labels.

Code review of the first fix cut also surfaced that src/dnssec.rs::name_to_wire was a parallel implementation of the old write_qname splitting loop — it predated and duplicated the buffer.rs logic, and was similarly broken for escaped labels. Both sides of the DNSSEC canonical form pipeline were silently wrong in the same way; making read_qname correct exposed the divergence, so it's fixed here in the same PR (#55).

Fix

Both sides of the text representation now follow RFC 1035 §5.1, and DNSSEC canonical form delegates to the same code path.

read_qname — rendering wire bytes to text

  • literal . within a label → \.
  • literal \\\
  • bytes outside 0x21..=0x7E\DDD (3-digit decimal)
  • printable ASCII passes through unchanged (fast path, no regression)

write_qname — parsing text back to wire

  • \. produces a literal 0x2E inside the current label (not a separator)
  • \\ produces a literal 0x5C
  • \DDD produces the byte with that decimal value (0..=255)
  • unescaped . still separates labels; empty labels (leading/trailing/consecutive dots) still skipped
  • rejects trailing \, short \DD, and \DDD > 255
  • 63-byte label cap checked incrementally so oversized labels fail fast

Implementation is a streaming byte-level pass: reserve a length byte, write the label body directly into the buffer, backpatch the length via set(). Zero intermediate allocations on the common path. Byte-level scanning is safe for UTF-8 input because continuation bytes (0x80..=0xBF) never collide with ASCII . (0x2E) or \ (0x5C).

read_qname's \DDD rendering also inlines the decimal digits to avoid a per-byte format! allocation on non-printable input.

dnssec::name_to_wire — DNSSEC canonical form

Reimplemented on top of BytePacketBuffer::write_qname: scratch buffer → write_qname → copy filled bytes → post-walk to lowercase label bodies (length bytes untouched). Lowercasing has to happen post-escape-resolution because a decimal escape like \065 yields 0x41 ('A') which must become 'a' in canonical form per RFC 4034 §6.2.

Call sites in build_signed_data, record_to_canonical_wire, record_rdata_canonical, and nsec3_hash are unchanged — name_to_wire's public signature stays the same, infallible Vec<u8> return.

Impact

Low in practice — real-world domains don't contain dots in labels — but this is a correctness bug in the wire format parser and the DNSSEC canonical form builder, which is the core of the project. Correctness bugs take priority over practical impact.

Performance: streaming implementation means zero allocations on the common write_qname path, matching main. The first iteration of this PR went via a Vec<Vec<u8>> helper; code review caught that bench_buffer_serialize calls write_qname ~6 times per response, so that approach would have added ~24 extra allocations per response. The streaming rewrite (commit 2) fixes that.

Tests

18 new unit tests across src/buffer.rs (16) and src/dnssec.rs (2). buffer.rs had no tests before this PR.

Buffer — read side

  • read_plain_domain
  • read_label_with_literal_dot_is_escaped — fanf2's exact example
  • read_label_with_backslash_is_escaped
  • read_label_with_nonprintable_byte_uses_decimal_escape
  • read_label_with_space_uses_decimal_escape

Buffer — write side

  • write_plain_domain
  • write_escaped_dot_does_not_split_label
  • write_escaped_backslash
  • write_decimal_escape_yields_raw_byte
  • write_rejects_out_of_range_decimal_escape (\999)
  • write_rejects_trailing_backslash
  • write_rejects_short_decimal_escape (\1)

Buffer — round-trip

  • roundtrip_preserves_dot_in_label
  • roundtrip_preserves_backslash_in_label
  • roundtrip_preserves_nonprintable_byte
  • root_name_empty_and_dot_both_produce_single_zero

DNSSEC canonical form

  • name_to_wire_escaped_dot_in_label_is_not_a_separator
  • name_to_wire_decimal_escape_is_lowercased — verifies post-escape-resolution lowercasing

Existing name_to_wire_root, name_to_wire_domain, and ds_verification tests continue to pass unchanged.

Total test count: 142 → 160.

Test plan

  • cargo test --lib — 160 passed, 0 failed
  • cargo clippy -- -D warnings — clean
  • cargo fmt --check — clean
  • CI green on all three platforms

Commits

  1. fix: escape DNS label text per RFC 1035 §5.1 — initial fix in buffer.rs
  2. refactor: stream label writes directly into buffer — addresses review feedback on allocation overhead in write_qname
  3. fix: route dnssec::name_to_wire through write_qname for escape handling — addresses #55 (surfaced by the review)

Credit

fanf2 on HN.

🤖 Generated with Claude Code

Closes #36. Closes #55. ## Bug `read_qname` pushed raw label bytes directly into the output string, producing ambiguous text for labels containing dots, backslashes, or non-printable bytes. fanf2 [spotted this on HN](https://news.ycombinator.com/item?id=47612321): > Wire format `[8]exa.mple[3]com[0]` (two labels, first containing a literal `0x2E`) was rendered as `exa.mple.com`, indistinguishable from three labels. Code review of the first fix cut also surfaced that `src/dnssec.rs::name_to_wire` was a parallel implementation of the old `write_qname` splitting loop — it predated and duplicated the buffer.rs logic, and was similarly broken for escaped labels. Both sides of the DNSSEC canonical form pipeline were silently wrong in the same way; making `read_qname` correct exposed the divergence, so it's fixed here in the same PR (#55). ## Fix Both sides of the text representation now follow RFC 1035 §5.1, and DNSSEC canonical form delegates to the same code path. ### `read_qname` — rendering wire bytes to text - literal `.` within a label → `\.` - literal `\` → `\\` - bytes outside `0x21..=0x7E` → `\DDD` (3-digit decimal) - printable ASCII passes through unchanged (fast path, no regression) ### `write_qname` — parsing text back to wire - `\.` produces a literal `0x2E` inside the current label (not a separator) - `\\` produces a literal `0x5C` - `\DDD` produces the byte with that decimal value (`0..=255`) - unescaped `.` still separates labels; empty labels (leading/trailing/consecutive dots) still skipped - rejects trailing `\`, short `\DD`, and `\DDD > 255` - 63-byte label cap checked incrementally so oversized labels fail fast Implementation is a **streaming byte-level pass**: reserve a length byte, write the label body directly into the buffer, backpatch the length via `set()`. Zero intermediate allocations on the common path. Byte-level scanning is safe for UTF-8 input because continuation bytes (`0x80..=0xBF`) never collide with ASCII `.` (0x2E) or `\` (0x5C). `read_qname`'s `\DDD` rendering also inlines the decimal digits to avoid a per-byte `format!` allocation on non-printable input. ### `dnssec::name_to_wire` — DNSSEC canonical form Reimplemented on top of `BytePacketBuffer::write_qname`: scratch buffer → `write_qname` → copy filled bytes → post-walk to lowercase label bodies (length bytes untouched). Lowercasing has to happen post-escape-resolution because a decimal escape like `\065` yields `0x41` (`'A'`) which must become `'a'` in canonical form per RFC 4034 §6.2. Call sites in `build_signed_data`, `record_to_canonical_wire`, `record_rdata_canonical`, and `nsec3_hash` are unchanged — `name_to_wire`'s public signature stays the same, infallible `Vec<u8>` return. ## Impact Low in practice — real-world domains don't contain dots in labels — but this is a correctness bug in the wire format parser and the DNSSEC canonical form builder, which is the core of the project. Correctness bugs take priority over practical impact. Performance: streaming implementation means **zero allocations on the common `write_qname` path**, matching `main`. The first iteration of this PR went via a `Vec<Vec<u8>>` helper; code review caught that `bench_buffer_serialize` calls `write_qname` ~6 times per response, so that approach would have added ~24 extra allocations per response. The streaming rewrite (commit 2) fixes that. ## Tests 18 new unit tests across `src/buffer.rs` (16) and `src/dnssec.rs` (2). `buffer.rs` had no tests before this PR. **Buffer — read side** - `read_plain_domain` - `read_label_with_literal_dot_is_escaped` — fanf2's exact example - `read_label_with_backslash_is_escaped` - `read_label_with_nonprintable_byte_uses_decimal_escape` - `read_label_with_space_uses_decimal_escape` **Buffer — write side** - `write_plain_domain` - `write_escaped_dot_does_not_split_label` - `write_escaped_backslash` - `write_decimal_escape_yields_raw_byte` - `write_rejects_out_of_range_decimal_escape` (`\999`) - `write_rejects_trailing_backslash` - `write_rejects_short_decimal_escape` (`\1`) **Buffer — round-trip** - `roundtrip_preserves_dot_in_label` - `roundtrip_preserves_backslash_in_label` - `roundtrip_preserves_nonprintable_byte` - `root_name_empty_and_dot_both_produce_single_zero` **DNSSEC canonical form** - `name_to_wire_escaped_dot_in_label_is_not_a_separator` - `name_to_wire_decimal_escape_is_lowercased` — verifies post-escape-resolution lowercasing Existing `name_to_wire_root`, `name_to_wire_domain`, and `ds_verification` tests continue to pass unchanged. Total test count: 142 → 160. ## Test plan - [x] `cargo test --lib` — 160 passed, 0 failed - [x] `cargo clippy -- -D warnings` — clean - [x] `cargo fmt --check` — clean - [x] CI green on all three platforms ## Commits 1. `fix: escape DNS label text per RFC 1035 §5.1` — initial fix in `buffer.rs` 2. `refactor: stream label writes directly into buffer` — addresses review feedback on allocation overhead in `write_qname` 3. `fix: route dnssec::name_to_wire through write_qname for escape handling` — addresses #55 (surfaced by the review) ## Credit [fanf2](https://news.ycombinator.com/item?id=47612321) on HN. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign in to join this conversation.