From e860731c01bd73706120884adcdd7f91908c5283 Mon Sep 17 00:00:00 2001
From: Razvan Dimescu <ssaricu@gmail.com>
Date: Fri, 10 Apr 2026 08:53:46 +0300
Subject: [PATCH] =?UTF-8?q?fix:=20escape=20DNS=20label=20text=20per=20RFC?=
 =?UTF-8?q?=201035=20=C2=A75.1=20(closes=20#36)=20(#54)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* fix: escape dots and special characters in DNS label text per RFC 1035 §5.1

Closes #36

read_qname was pushing raw label bytes directly into the output string,
producing ambiguous text for labels containing dots, backslashes, or
non-printable bytes. fanf2 spotted this on HN: wire format
`[8]exa.mple[3]com[0]` (two labels, first containing a literal 0x2E)
was rendered as `exa.mple.com`, indistinguishable from three labels.

Fix both sides of the text representation per RFC 1035 §5.1:

read_qname — when rendering wire bytes to text:
- literal `.` within a label → `\.`
- literal `\` → `\\`
- bytes outside 0x21..=0x7E → `\DDD` (3-digit decimal)
- printable ASCII passes through unchanged

write_qname — when parsing text back to wire:
- `\.` produces a literal 0x2E inside the current label (not a separator)
- `\\` produces a literal 0x5C
- `\DDD` produces the byte with that decimal value (0..=255)
- unescaped `.` still separates labels, empty labels still skipped
- rejects trailing `\`, short `\DD`, and `\DDD` > 255

Impact in practice is low — real-world domains don't contain dots in
labels — but it's a correctness bug in the wire format parser that
could cause round-trip failures with adversarial input. The parser is
the core of the project, so correctness bugs take priority over
practical impact.

Adds 16 unit tests in a new `#[cfg(test)] mod tests` block covering:
plain domain read/write, literal-dot escaping on both sides, backslash
escaping, non-printable + space decimal escapes, full round-trip
preservation, and the three rejection cases for malformed escapes.

Credit: fanf2 (https://news.ycombinator.com/item?id=47612321)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: stream label writes directly into buffer (review feedback)

The first cut of this fix delegated write_qname to a helper
(parse_escaped_labels) that built Vec<Vec<u8>> up-front, then iterated
to emit the wire bytes. On a plain-ASCII domain like "www.google.com"
that's ~4 heap allocations per write_qname call, and record.rs calls
write_qname ~6 times per response — so this PR would regress
bench_buffer_serialize by roughly 24 extra allocations per response
vs. main, where the old non-escaping code had zero.

Rewrite write_qname as a streaming byte-level loop that reserves the
length byte up-front, writes the label body directly into the buffer,
then backpatches the length via set(). Zero intermediate allocations
on the common path, and the 63-byte label cap is now checked
incrementally so oversized labels fail fast.

Byte-level scanning is safe for UTF-8 input: continuation bytes are
always in 0x80..=0xBF, so they can never collide with the ASCII `.`
(0x2E) or `\` (0x5C) that drive label splitting and escape parsing.

Also inline the \DDD rendering in read_qname to avoid the per-byte
format!() allocation on non-printable input. Plain-ASCII reads hit
the unchanged push(c as char) fast path, so the common case has zero
regression.

The parse_escaped_labels helper is deleted — no remaining callers.

All 158 tests pass, clippy + fmt clean. Collapses three review
findings (HIGH allocation regression, MEDIUM format! allocation,
MEDIUM .unwrap() after digit guard) in one pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: route dnssec::name_to_wire through write_qname for escape handling

Closes #55.

dnssec::name_to_wire was a parallel implementation of the old
write_qname's splitting loop — it iterated qname.split('.') and pushed
raw bytes. It predated and duplicated the buffer.rs logic, and it did
not understand RFC 1035 §5.1 text escapes. After the read_qname fix in
this PR, names that come out of read_qname can contain \., \\, or
\DDD sequences; feeding those back into the old name_to_wire would
split on the literal '.' inside a \. sequence and produce corrupt
RRSIG signed-data blobs.

The underlying bug predates this PR — the old read_qname was broken
too, so both sides of the DNSSEC canonical form pipeline were
silently wrong in the same way. Making read_qname correct exposed the
divergence, so it's fixed here in the same PR that introduced the
exposure.

Reimplement name_to_wire on top of BytePacketBuffer::write_qname:
reserve a scratch buffer, let write_qname handle the escape parsing
and length-byte framing, copy the emitted bytes into a Vec, then
walk the wire once more to lowercase label bodies (length bytes stay
untouched). Canonical form per RFC 4034 §6.2 requires the
lowercasing, and it has to happen post-escape-resolution — a
decimal escape like \065 produces 0x41 ('A'), which must be
lowercased to 'a' in the final wire bytes.

Call sites in build_signed_data, record_to_canonical_wire,
record_rdata_canonical, and nsec3_hash are unchanged — the public
signature stays the same, infallible Vec<u8> return.

Tests:
- name_to_wire_escaped_dot_in_label_is_not_a_separator — verifies
  the fanf2 example round-trips correctly through canonical form
- name_to_wire_decimal_escape_is_lowercased — verifies post-escape
  lowercasing (the subtle correctness requirement)
- existing name_to_wire_root, name_to_wire_domain, ds_verification
  tests still pass unchanged

Test count: 158 → 160.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: tighten name_to_wire per review feedback

- Replace hand-rolled per-byte lowercase loop with stdlib
  [u8]::make_ascii_lowercase(). Shorter and idiomatic.
- Tighten the .expect() message to state the actual invariant
  (parseable DNS name) instead of vague "well-formed" language.
- Replace the doc comment's "see #55" with the real invariant —
  issue numbers rot, and by merge time #55 is closed anyway. The
  comment now explains WHY the lowercase pass has to happen
  post-escape-resolution (\065 → 'A' → 'a') instead of during
  write_qname.
- Drop the redundant `\065` test comment (the one-liner version
  is enough with the assertion showing the transform).

No behavior change; 160 tests still pass, clippy + fmt clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: cover label cap and empty-label rollback; trim doc comments

Closes coverage gaps left by PR #54:

- write_rejects_label_over_63_bytes: pins the incremental 63-byte cap
  inside write_qname's inner loop (boundary at 63 vs 64).
- write_skips_empty_labels: pins the rollback branch (pos = len_pos)
  triggered by leading or consecutive dots.

Doc comments tightened:

- write_qname: drop the streaming-impl walkthrough and the escape-grammar
  restatement (already documented on read_qname).
- name_to_wire: drop the implementation explanation; keep the
  post-escape lowercasing rationale, which pins behavior a future
  refactor could silently break.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 src/buffer.rs | 239 +++++++++++++++++++++++++++++++++++++++++++++++---
 src/dnssec.rs |  51 ++++++++---
 2 files changed, 266 insertions(+), 24 deletions(-)
diff --git a/src/buffer.rs b/src/buffer.rs
index 5db470c..4e954c9 100644
--- a/src/buffer.rs
+++ b/src/buffer.rs
@@ -84,6 +84,11 @@ impl BytePacketBuffer {
 
     /// Read a qname, handling label compression (pointer jumps).
     /// Converts wire format like [3]www[6]google[3]com[0] into "www.google.com".
+    ///
+    /// Label bytes are escaped per RFC 1035 §5.1:
+    /// - literal `.` within a label → `\.`
+    /// - literal `\` → `\\`
+    /// - bytes outside `0x21..=0x7E` (excluding `.` and `\`) → `\DDD` (3-digit decimal)
     pub fn read_qname(&mut self, outstr: &mut String) -> Result<()> {
         let mut pos = self.pos();
         let mut jumped = false;
@@ -121,7 +126,18 @@ impl BytePacketBuffer {
 
                 let str_buffer = self.get_range(pos, len as usize)?;
                 for &b in str_buffer {
-                    outstr.push(b.to_ascii_lowercase() as char);
+                    let c = b.to_ascii_lowercase();
+                    match c {
+                        b'.' => outstr.push_str("\\."),
+                        b'\\' => outstr.push_str("\\\\"),
+                        0x21..=0x7E => outstr.push(c as char),
+                        _ => {
+                            outstr.push('\\');
+                            outstr.push((b'0' + c / 100) as char);
+                            outstr.push((b'0' + (c / 10) % 10) as char);
+                            outstr.push((b'0' + c % 10) as char);
+                        }
+                    }
                 }
 
                 delim = ".";
@@ -163,24 +179,68 @@ impl BytePacketBuffer {
         Ok(())
     }
 
+    /// Write a qname in wire format, parsing RFC 1035 §5.1 text escapes.
+    /// See `read_qname` for the escape grammar.
     pub fn write_qname(&mut self, qname: &str) -> Result<()> {
         if qname.is_empty() || qname == "." {
             self.write_u8(0)?;
             return Ok(());
         }
 
-        for label in qname.split('.') {
-            let len = label.len();
-            if len == 0 {
-                continue; // skip empty labels from trailing dot
-            }
-            if len > 0x3f {
-                return Err("Single label exceeds 63 characters of length".into());
+        let bytes = qname.as_bytes();
+        let mut i = 0;
+        while i < bytes.len() {
+            let len_pos = self.pos;
+            self.write_u8(0)?; // placeholder length byte, backpatched below
+            let body_start = self.pos;
+
+            while i < bytes.len() && bytes[i] != b'.' {
+                let b = bytes[i];
+                if b == b'\\' {
+                    i += 1;
+                    let c1 = *bytes.get(i).ok_or("trailing backslash in qname")?;
+                    if c1.is_ascii_digit() {
+                        let c2 = *bytes
+                            .get(i + 1)
+                            .ok_or("invalid \\DDD escape: expected 3 digits")?;
+                        let c3 = *bytes
+                            .get(i + 2)
+                            .ok_or("invalid \\DDD escape: expected 3 digits")?;
+                        if !c2.is_ascii_digit() || !c3.is_ascii_digit() {
+                            return Err("invalid \\DDD escape: expected 3 digits".into());
+                        }
+                        let val =
+                            (c1 - b'0') as u16 * 100 + (c2 - b'0') as u16 * 10 + (c3 - b'0') as u16;
+                        if val > 255 {
+                            return Err(format!("\\DDD escape out of range: {}", val).into());
+                        }
+                        self.write_u8(val as u8)?;
+                        i += 3;
+                    } else {
+                        // \. \\ and any other \X → literal next byte
+                        self.write_u8(c1)?;
+                        i += 1;
+                    }
+                } else {
+                    self.write_u8(b)?;
+                    i += 1;
+                }
+
+                if self.pos - body_start > 0x3f {
+                    return Err("Single label exceeds 63 characters of length".into());
+                }
             }
 
-            self.write_u8(len as u8)?;
-            for b in label.as_bytes() {
-                self.write_u8(*b)?;
+            let label_len = self.pos - body_start;
+            if label_len == 0 && i < bytes.len() {
+                // Empty label from leading/consecutive dots — roll back the placeholder.
+                self.pos = len_pos;
+            } else {
+                self.set(len_pos, label_len as u8)?;
+            }
+
+            if i < bytes.len() && bytes[i] == b'.' {
+                i += 1;
             }
         }
 
@@ -212,3 +272,160 @@ impl BytePacketBuffer {
         Ok(())
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn roundtrip(wire: &[u8]) -> String {
+        let mut buf = BytePacketBuffer::from_bytes(wire);
+        let mut out = String::new();
+        buf.read_qname(&mut out).unwrap();
+        out
+    }
+
+    fn write_then_read(text: &str) -> String {
+        let mut buf = BytePacketBuffer::new();
+        buf.write_qname(text).unwrap();
+        let wire_end = buf.pos();
+        buf.seek(0).unwrap();
+        let mut out = String::new();
+        buf.read_qname(&mut out).unwrap();
+        assert_eq!(
+            buf.pos(),
+            wire_end,
+            "reader should consume exactly what writer wrote"
+        );
+        out
+    }
+
+    #[test]
+    fn read_plain_domain() {
+        // [3]www[6]google[3]com[0]
+        let wire = b"\x03www\x06google\x03com\x00";
+        assert_eq!(roundtrip(wire), "www.google.com");
+    }
+
+    #[test]
+    fn read_label_with_literal_dot_is_escaped() {
+        // fanf2's example: [8]exa.mple[3]com[0] — two labels, first contains 0x2E
+        let wire = b"\x08exa.mple\x03com\x00";
+        assert_eq!(roundtrip(wire), "exa\\.mple.com");
+    }
+
+    #[test]
+    fn read_label_with_backslash_is_escaped() {
+        // [4]a\bc[3]com[0]
+        let wire = b"\x04a\\bc\x03com\x00";
+        assert_eq!(roundtrip(wire), "a\\\\bc.com");
+    }
+
+    #[test]
+    fn read_label_with_nonprintable_byte_uses_decimal_escape() {
+        // [4]\x00foo[3]com[0] — null byte at label start
+        let wire = b"\x04\x00foo\x03com\x00";
+        assert_eq!(roundtrip(wire), "\\000foo.com");
+    }
+
+    #[test]
+    fn read_label_with_space_uses_decimal_escape() {
+        // Space (0x20) is outside 0x21..=0x7E, so it must be decimal-escaped.
+        let wire = b"\x05a b c\x00";
+        assert_eq!(roundtrip(wire), "a\\032b\\032c");
+    }
+
+    #[test]
+    fn write_plain_domain() {
+        let mut buf = BytePacketBuffer::new();
+        buf.write_qname("www.google.com").unwrap();
+        assert_eq!(&buf.buf[..buf.pos], b"\x03www\x06google\x03com\x00");
+    }
+
+    #[test]
+    fn write_escaped_dot_does_not_split_label() {
+        let mut buf = BytePacketBuffer::new();
+        buf.write_qname("exa\\.mple.com").unwrap();
+        assert_eq!(&buf.buf[..buf.pos], b"\x08exa.mple\x03com\x00");
+    }
+
+    #[test]
+    fn write_escaped_backslash() {
+        let mut buf = BytePacketBuffer::new();
+        buf.write_qname("a\\\\bc.com").unwrap();
+        assert_eq!(&buf.buf[..buf.pos], b"\x04a\\bc\x03com\x00");
+    }
+
+    #[test]
+    fn write_decimal_escape_yields_raw_byte() {
+        let mut buf = BytePacketBuffer::new();
+        buf.write_qname("\\000foo.com").unwrap();
+        assert_eq!(&buf.buf[..buf.pos], b"\x04\x00foo\x03com\x00");
+    }
+
+    #[test]
+    fn write_skips_empty_labels() {
+        // Leading dot — first (empty) label is rolled back.
+        let mut buf = BytePacketBuffer::new();
+        buf.write_qname(".foo.com").unwrap();
+        assert_eq!(&buf.buf[..buf.pos], b"\x03foo\x03com\x00");
+
+        // Consecutive dots — middle empty label is rolled back.
+        let mut buf = BytePacketBuffer::new();
+        buf.write_qname("foo..com").unwrap();
+        assert_eq!(&buf.buf[..buf.pos], b"\x03foo\x03com\x00");
+    }
+
+    #[test]
+    fn write_rejects_out_of_range_decimal_escape() {
+        let mut buf = BytePacketBuffer::new();
+        assert!(buf.write_qname("\\999foo.com").is_err());
+    }
+
+    #[test]
+    fn write_rejects_trailing_backslash() {
+        let mut buf = BytePacketBuffer::new();
+        assert!(buf.write_qname("foo\\").is_err());
+    }
+
+    #[test]
+    fn write_rejects_short_decimal_escape() {
+        let mut buf = BytePacketBuffer::new();
+        assert!(buf.write_qname("\\1").is_err());
+    }
+
+    #[test]
+    fn write_rejects_label_over_63_bytes() {
+        // 64 bytes exceeds the wire-format label cap.
+        let mut buf = BytePacketBuffer::new();
+        assert!(buf.write_qname(&"a".repeat(64)).is_err());
+
+        // 63 bytes is the maximum permitted label length.
+        let mut buf = BytePacketBuffer::new();
+        assert!(buf.write_qname(&"a".repeat(63)).is_ok());
+    }
+
+    #[test]
+    fn roundtrip_preserves_dot_in_label() {
+        assert_eq!(write_then_read("exa\\.mple.com"), "exa\\.mple.com");
+    }
+
+    #[test]
+    fn roundtrip_preserves_backslash_in_label() {
+        assert_eq!(write_then_read("a\\\\b.com"), "a\\\\b.com");
+    }
+
+    #[test]
+    fn roundtrip_preserves_nonprintable_byte() {
+        assert_eq!(write_then_read("\\000foo.com"), "\\000foo.com");
+    }
+
+    #[test]
+    fn root_name_empty_and_dot_both_produce_single_zero() {
+        let mut a = BytePacketBuffer::new();
+        a.write_qname("").unwrap();
+        let mut b = BytePacketBuffer::new();
+        b.write_qname(".").unwrap();
+        assert_eq!(&a.buf[..a.pos], b"\x00");
+        assert_eq!(&b.buf[..b.pos], b"\x00");
+    }
+}
diff --git a/src/dnssec.rs b/src/dnssec.rs
index b336284..8614810 100644
--- a/src/dnssec.rs
+++ b/src/dnssec.rs
@@ -5,6 +5,7 @@ use log::{debug, trace};
 use ring::digest;
 use ring::signature;
 
+use crate::buffer::BytePacketBuffer;
 use crate::cache::{DnsCache, DnssecStatus};
 use crate::packet::DnsPacket;
 use crate::question::QueryType;
@@ -720,22 +721,29 @@ pub fn verify_ds(ds: &DnsRecord, dnskey: &DnsRecord, owner: &str) -> bool {
 
 // -- Canonical wire format --
 
+/// Encode a DNS name in canonical wire form per RFC 4034 §6.2:
+/// uncompressed, with ASCII letters lowercased.
+///
+/// Lowercasing happens *after* escape resolution because `\065` yields
+/// `'A'`, which canonical form must convert to `'a'`.
 pub fn name_to_wire(name: &str) -> Vec<u8> {
-    let mut wire = Vec::with_capacity(name.len() + 2);
-    if name == "." || name.is_empty() {
-        wire.push(0);
-        return wire;
-    }
-    for label in name.split('.') {
-        if label.is_empty() {
-            continue;
-        }
-        wire.push(label.len() as u8);
-        for &b in label.as_bytes() {
-            wire.push(b.to_ascii_lowercase());
+    let mut buf = BytePacketBuffer::new();
+    buf.write_qname(name)
+        .expect("name_to_wire: input must parse as a valid DNS name");
+    let mut wire = buf.filled().to_vec();
+
+    let mut i = 0;
+    while i < wire.len() {
+        let label_len = wire[i] as usize;
+        if label_len == 0 {
+            break;
         }
+        i += 1;
+        let end = i + label_len;
+        wire[i..end].make_ascii_lowercase();
+        i = end;
     }
-    wire.push(0);
+
     wire
 }
 
@@ -1475,6 +1483,23 @@ mod tests {
         );
     }
 
+    #[test]
+    fn name_to_wire_escaped_dot_in_label_is_not_a_separator() {
+        // `exa\.mple.com` is two labels: `exa.mple` (8 bytes including the 0x2E) and `com`.
+        let wire = name_to_wire("exa\\.mple.com");
+        assert_eq!(
+            wire,
+            vec![8, b'e', b'x', b'a', b'.', b'm', b'p', b'l', b'e', 3, b'c', b'o', b'm', 0]
+        );
+    }
+
+    #[test]
+    fn name_to_wire_decimal_escape_is_lowercased() {
+        // \065 = 'A', must become 'a' in canonical form.
+        let wire = name_to_wire("\\065bc.com");
+        assert_eq!(wire, vec![3, b'a', b'b', b'c', 3, b'c', b'o', b'm', 0]);
+    }
+
     #[test]
     fn parent_zone_cases() {
         assert_eq!(parent_zone("example.com"), "com");