diff --git a/.github/workflows/static.yml b/.github/workflows/static.yml index b0274ce..9db4306 100644 --- a/.github/workflows/static.yml +++ b/.github/workflows/static.yml @@ -31,6 +31,10 @@ jobs: steps: - name: Checkout uses: actions/checkout@v4 + - name: Install pandoc + run: sudo apt-get install -y pandoc + - name: Generate blog HTML + run: make blog - name: Setup Pages uses: actions/configure-pages@v5 - name: Upload artifact diff --git a/.gitignore b/.gitignore index 1b715be..9dcba3d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ /target CLAUDE.md docs/ +site/blog/posts/ diff --git a/Makefile b/Makefile index 81ee410..9e95829 100644 --- a/Makefile +++ b/Makefile @@ -23,11 +23,11 @@ bench: cargo bench blog: - @mkdir -p site/blog + @mkdir -p site/blog/posts @for f in blog/*.md; do \ name=$$(basename "$$f" .md); \ - pandoc "$$f" --template=site/blog-template.html -o "site/blog/$$name.html"; \ - echo " $$f → site/blog/$$name.html"; \ + pandoc "$$f" --template=site/blog-template.html -o "site/blog/posts/$$name.html"; \ + echo " $$f → site/blog/posts/$$name.html"; \ done clean: diff --git a/blog/dns-from-scratch.md b/blog/dns-from-scratch.md index b0740be..7bf666c 100644 --- a/blog/dns-from-scratch.md +++ b/blog/dns-from-scratch.md @@ -8,7 +8,7 @@ I wanted to understand how DNS actually works. Not the "it translates domain nam So I built one from scratch in Rust. No `hickory-dns`, no `trust-dns`, no `simple-dns`. The entire RFC 1035 wire protocol — headers, labels, compression pointers, record types — parsed and serialized by hand. It started as a weekend learning project, became a side project I kept coming back to over 6 years, and eventually turned into [Numa](https://github.com/razvandimescu/numa) — which I now use as my actual system DNS. -A note on terminology: Numa supports two resolution modes. *Forward* mode relays queries to an upstream (Quad9, Cloudflare, or any DoH provider). *Recursive* mode walks the delegation chain from root servers itself — iterative queries to root, TLD, and authoritative nameservers, with full DNSSEC validation. In both modes, Numa does useful things with your DNS traffic locally (caching, ad blocking, overrides, local service domains) before resolving what it can't answer. This post covers the wire protocol and forwarding path; [the next post](/blog/dnssec-from-scratch.html) covers recursive resolution and DNSSEC. +A note on terminology: Numa supports two resolution modes. *Forward* mode relays queries to an upstream (Quad9, Cloudflare, or any DoH provider). *Recursive* mode walks the delegation chain from root servers itself — iterative queries to root, TLD, and authoritative nameservers, with full DNSSEC validation. In both modes, Numa does useful things with your DNS traffic locally (caching, ad blocking, overrides, local service domains) before resolving what it can't answer. This post covers the wire protocol and forwarding path; [the next post](/blog/posts/dnssec-from-scratch.html) covers recursive resolution and DNSSEC. Here's what surprised me along the way. @@ -317,7 +317,7 @@ That creates the DNS entry, generates a TLS certificate, and starts proxying — **Update (March 2026):** Recursive resolution and DNSSEC validation are now shipped. Numa resolves from root nameservers with full chain-of-trust verification (RSA/SHA-256, ECDSA P-256, Ed25519) and NSEC/NSEC3 authenticated denial of existence. -**[Read the follow-up: Implementing DNSSEC from Scratch in Rust →](/blog/dnssec-from-scratch.html)** +**[Read the follow-up: Implementing DNSSEC from Scratch in Rust →](/blog/posts/dnssec-from-scratch.html)** Still on the roadmap: diff --git a/blog/dnssec-from-scratch.md b/blog/dnssec-from-scratch.md index 82c712f..fc9db7c 100644 --- a/blog/dnssec-from-scratch.md +++ b/blog/dnssec-from-scratch.md @@ -4,7 +4,7 @@ description: Recursive resolution from root hints, chain-of-trust validation, NS date: March 2026 --- -In the [previous post](/blog/dns-from-scratch.html) I covered how DNS works at the wire level — packet format, label compression, TTL caching, DoH. Numa was a forwarding resolver: it parsed packets, did useful things locally, and relayed the rest to Cloudflare or Quad9. +In the [previous post](/blog/posts/dns-from-scratch.html) I covered how DNS works at the wire level — packet format, label compression, TTL caching, DoH. Numa was a forwarding resolver: it parsed packets, did useful things locally, and relayed the rest to Cloudflare or Quad9. That post ended with "recursive resolution and DNSSEC are on the roadmap." This post is about building both. diff --git a/site/blog/dns-from-scratch.html b/site/blog/dns-from-scratch.html deleted file mode 100644 index 9ac4ab4..0000000 --- a/site/blog/dns-from-scratch.html +++ /dev/null @@ -1,651 +0,0 @@ - - - - - -I Built a DNS Resolver from Scratch in Rust — Numa - - - - - - - - -
-
-

I Built a DNS Resolver from Scratch in Rust

- -
- -

I wanted to understand how DNS actually works. Not the “it translates -domain names to IP addresses” explanation — the actual bytes on the -wire. What does a DNS packet look like? How does label compression work? -Why is everything crammed into 512 bytes?

-

So I built one from scratch in Rust. No hickory-dns, no -trust-dns, no simple-dns. The entire RFC 1035 -wire protocol — headers, labels, compression pointers, record types — -parsed and serialized by hand. It started as a weekend learning project, -became a side project I kept coming back to over 6 years, and eventually -turned into Numa — -which I now use as my actual system DNS.

-

A note on terminology: Numa supports two resolution modes. -Forward mode relays queries to an upstream (Quad9, Cloudflare, -or any DoH provider). Recursive mode walks the delegation chain -from root servers itself — iterative queries to root, TLD, and -authoritative nameservers, with full DNSSEC validation. In both modes, -Numa does useful things with your DNS traffic locally (caching, ad -blocking, overrides, local service domains) before resolving what it -can’t answer. This post covers the wire protocol and forwarding path; the next post covers recursive -resolution and DNSSEC.

-

Here’s what surprised me along the way.

-

What does a DNS -packet actually look like?

-

You can see a real one yourself. Run this:

-
dig @127.0.0.1 example.com A +noedns
-
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15242
-;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
-
-;; QUESTION SECTION:
-;example.com.                   IN      A
-
-;; ANSWER SECTION:
-example.com.            53      IN      A       104.18.27.120
-example.com.            53      IN      A       104.18.26.120
-

That’s the human-readable version. But what’s actually on the wire? A -DNS query for example.com A is just 29 bytes:

-
         ID    Flags  QCount ACount NSCount ARCount
-        ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐
-Header: AB CD  01 00  00 01  00 00  00 00  00 00
-        └────┘ └────┘ └────┘ └────┘ └────┘ └────┘
-         ↑      ↑      ↑
-         │      │      └─ 1 question, 0 answers, 0 authority, 0 additional
-         │      └─ Standard query, recursion desired
-         └─ Random ID (we'll match this in the response)
-
-Question: 07 65 78 61 6D 70 6C 65  03 63 6F 6D  00  00 01  00 01
-          ── ─────────────────────  ── ─────────  ──  ─────  ─────
-          7  e  x  a  m  p  l  e   3  c  o  m   end  A      IN
-          ↑                        ↑             ↑
-          └─ length prefix         └─ length     └─ root label (end of name)
-

12 bytes of header + 17 bytes of question = 29 bytes to ask “what’s -the IP for example.com?” Compare that to an HTTP request for the same -information — you’d need hundreds of bytes just for headers.

-

We can send exactly those bytes and capture what comes back:

-
python3 -c "
-import socket
-# Hand-craft a DNS query: header (12 bytes) + question (17 bytes)
-q  = b'\xab\xcd\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00'  # header
-q += b'\x07example\x03com\x00\x00\x01\x00\x01'              # question
-s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
-s.sendto(q, ('127.0.0.1', 53))
-resp = s.recv(512)
-for i in range(0, len(resp), 16):
-    h = ' '.join(f'{b:02x}' for b in resp[i:i+16])
-    a = ''.join(chr(b) if 32<=b<127 else '.' for b in resp[i:i+16])
-    print(f'{i:08x}  {h:<48s}  {a}')
-"
-
00000000  ab cd 81 80 00 01 00 02 00 00 00 00 07 65 78 61   .............exa
-00000010  6d 70 6c 65 03 63 6f 6d 00 00 01 00 01 07 65 78   mple.com......ex
-00000020  61 6d 70 6c 65 03 63 6f 6d 00 00 01 00 01 00 00   ample.com.......
-00000030  00 19 00 04 68 12 1b 78 07 65 78 61 6d 70 6c 65   ....h..x.example
-00000040  03 63 6f 6d 00 00 01 00 01 00 00 00 19 00 04 68   .com...........h
-00000050  12 1a 78                                          ..x
-

83 bytes back. Let’s annotate the response:

-
         ID    Flags  QCount ACount NSCount ARCount
-        ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐
-Header: AB CD  81 80  00 01  00 02  00 00  00 00
-        └────┘ └────┘ └────┘ └────┘ └────┘ └────┘
-         ↑      ↑      ↑      ↑
-         │      │      │      └─ 2 answers
-         │      │      └─ 1 question (echoed back)
-         │      └─ Response flag set, recursion available
-         └─ Same ID as our query
-
-Question: 07 65 78 61 6D 70 6C 65  03 63 6F 6D  00  00 01  00 01
-          (same as our query — echoed back)
-
-Answer 1: 07 65 78 61 6D 70 6C 65  03 63 6F 6D  00  00 01  00 01
-          ─────────────────────────────────────  ──  ─────  ─────
-          e  x  a  m  p  l  e  .  c  o  m       end  A      IN
-
-          00 00 00 19  00 04  68 12 1B 78
-          ───────────  ─────  ───────────
-          TTL: 25s     len:4  104.18.27.120
-
-Answer 2: (same domain repeated)  00 01  00 01  00 00 00 19  00 04  68 12 1A 78
-                                                                    ───────────
-                                                                    104.18.26.120
-

Notice something wasteful? The domain example.com -appears three times — once in the question, twice in the -answers. That’s 39 bytes of repeated names in an 83-byte packet. DNS has -a solution for this — but first, the overall structure.

-

The whole thing fits in a single UDP datagram. The structure is:

-
+--+--+--+--+--+--+--+--+
-|         Header         |  12 bytes: ID, flags, counts
-+--+--+--+--+--+--+--+--+
-|        Questions       |  What you're asking
-+--+--+--+--+--+--+--+--+
-|         Answers        |  The response records
-+--+--+--+--+--+--+--+--+
-|       Authorities      |  NS records for the zone
-+--+--+--+--+--+--+--+--+
-|       Additional       |  Extra helpful records
-+--+--+--+--+--+--+--+--+
-

In Rust, parsing the header is just reading 12 bytes and unpacking -the flags:

-
pub fn read(buffer: &mut BytePacketBuffer) -> Result<DnsHeader> {
-    let id = buffer.read_u16()?;
-    let flags = buffer.read_u16()?;
-    // Flags pack 9 fields into 16 bits
-    let recursion_desired = (flags & (1 << 8)) > 0;
-    let truncated_message = (flags & (1 << 9)) > 0;
-    let authoritative_answer = (flags & (1 << 10)) > 0;
-    let opcode = (flags >> 11) & 0x0F;
-    let response = (flags & (1 << 15)) > 0;
-    // ... and so on
-}
-

No padding, no alignment, no JSON overhead. DNS was designed in 1987 -when every byte counted, and honestly? The wire format is kind of -beautiful in its efficiency.

-

Label compression is the -clever part

-

Remember how example.com appeared three times in that -83-byte response? Domain names in DNS are stored as a sequence of -labels — length-prefixed segments:

-
example.com → [7]example[3]com[0]
-

The [7] means “the next 7 bytes are a label.” The -[0] is the root label (end of name). That’s 13 bytes per -occurrence, 39 bytes for three repetitions. In a response with authority -and additional records, domain names can account for half the -packet.

-

DNS solves this with compression pointers — if the -top two bits of a length byte are 11, the remaining 14 bits -are an offset back into the packet where the rest of the name can be -found. A well-compressed version of our response would replace the -answer names with C0 0C — a 2-byte pointer to offset 12 -where example.com first appears in the question section. -That turns 39 bytes of names into 15 (13 + 2 + 2). Our upstream didn’t -bother compressing, but many do — especially when related domains -appear:

-
Offset 0x20: [6]google[3]com[0]        ← full name
-Offset 0x40: [4]mail[0xC0][0x20]       ← "mail" + pointer to offset 0x20
-Offset 0x50: [3]www[0xC0][0x20]        ← "www" + pointer to offset 0x20
-

Pointers can chain — a pointer can point to another pointer. Parsing -this correctly requires tracking your position in the buffer and -handling jumps:

-
pub fn read_qname(&mut self, outstr: &mut String) -> Result<()> {
-    let mut pos = self.pos();
-    let mut jumped = false;
-    let mut delim = "";
-
-    loop {
-        let len = self.get(pos)?;
-
-        // Top two bits set = compression pointer
-        if (len & 0xC0) == 0xC0 {
-            if !jumped {
-                self.seek(pos + 2)?; // advance past the pointer
-            }
-            let offset = (((len as u16) ^ 0xC0) << 8) | self.get(pos + 1)? as u16;
-            pos = offset as usize;
-            jumped = true;
-            continue;
-        }
-
-        pos += 1;
-        if len == 0 { break; } // root label
-
-        outstr.push_str(delim);
-        outstr.push_str(&self.get_range(pos, len as usize)?
-            .iter().map(|&b| b as char).collect::<String>());
-        delim = ".";
-        pos += len as usize;
-    }
-
-    if !jumped {
-        self.seek(pos)?;
-    }
-    Ok(())
-}
-

This one bit me: when you follow a pointer, you must not -advance the buffer’s read position past where you jumped from. The -pointer is 2 bytes, so you advance by 2, but the actual label data lives -elsewhere in the packet. If you follow the pointer and also advance past -it, you’ll skip over the next record entirely. I spent a fun evening -debugging that one.

-

TTL adjustment on read, not -write

-

This is my favorite trick in the whole codebase. I initially stored -the remaining TTL and decremented it, which meant I needed a background -thread to sweep expired entries. It worked, but it felt wrong — too much -machinery for something simple.

-

The cleaner approach: store the original TTL and the timestamp when -the record was cached. On read, compute -remaining = original_ttl - elapsed. If it’s zero or -negative, the entry is stale — evict it lazily.

-
pub fn lookup(&mut self, domain: &str, qtype: QueryType) -> Option<DnsPacket> {
-    let key = (domain.to_lowercase(), qtype);
-    let entry = self.entries.get(&key)?;
-    let elapsed = entry.cached_at.elapsed().as_secs() as u32;
-
-    if elapsed >= entry.original_ttl {
-        self.entries.remove(&key);
-        return None;
-    }
-
-    // Adjust TTLs in the response to reflect remaining time
-    let mut packet = entry.packet.clone();
-    for answer in &mut packet.answers {
-        answer.set_ttl(entry.original_ttl.saturating_sub(elapsed));
-    }
-    Some(packet)
-}
-

No background thread. No timer. Entries expire lazily. The cache -stays consistent because every consumer sees the adjusted TTL.

-

The resolution pipeline

-

Each incoming UDP packet spawns a tokio task. Each task walks a -deterministic pipeline — every step either answers or passes to the -next:

-
                     ┌─────────────────────────────────────────────────────┐
-                     │              Numa Resolution Pipeline               │
-                     └─────────────────────────────────────────────────────┘
-
-  Query ──→ Overrides ──→ .numa TLD ──→ Blocklist ──→ Zones ──→ Cache ──→ DoH
-    │        │              │             │             │         │         │
-    │        │ match?       │ match?      │ blocked?    │ match?  │ hit?    │
-    │        ↓              ↓             ↓             ↓         ↓         ↓
-    │      respond        respond       0.0.0.0      respond   respond   forward
-    │      (auto-reverts  (reverse      (ad gone)    (static   (TTL      to upstream
-    │       after N min)   proxy+TLS)                 records)  adjusted) (encrypted)
-    │
-    └──→ Each step either answers or passes to the next.
-

This is where “from scratch” pays off. Want conditional forwarding -for Tailscale? Insert a step before the upstream. Want to override -api.example.com for 5 minutes while debugging? Add an entry -in the overrides step — it auto-expires. A DNS library would have hidden -this pipeline behind an opaque resolve() call.

-

DNS-over-HTTPS: the -“wait, that’s it?” moment

-

The most recent addition, and honestly the one that surprised me with -how little code it needed. DoH (RFC 8484) is conceptually simple: take -the exact same DNS wire-format packet you’d send over UDP, POST it to an -HTTPS endpoint with Content-Type: application/dns-message, -and parse the response the same way. Same bytes, different -transport.

-
async fn forward_doh(
-    query: &DnsPacket,
-    url: &str,
-    client: &reqwest::Client,
-    timeout_duration: Duration,
-) -> Result<DnsPacket> {
-    let mut send_buffer = BytePacketBuffer::new();
-    query.write(&mut send_buffer)?;
-
-    let resp = timeout(timeout_duration, client
-        .post(url)
-        .header("content-type", "application/dns-message")
-        .header("accept", "application/dns-message")
-        .body(send_buffer.filled().to_vec())
-        .send())
-    .await??.error_for_status()?;
-
-    let bytes = resp.bytes().await?;
-    let mut recv_buffer = BytePacketBuffer::from_bytes(&bytes);
-    DnsPacket::from_buffer(&mut recv_buffer)
-}
-

The one gotcha that cost me an hour: Quad9 and other DoH providers -require HTTP/2. My first attempt used HTTP/1.1 and got a cryptic 400 Bad -Request. Adding the http2 feature to reqwest fixed it. The -upside of HTTP/2? Connection multiplexing means subsequent queries reuse -the TLS session — ~16ms vs ~50ms for the first query. Free -performance.

-

The Upstream enum dispatches between UDP and DoH based -on the URL scheme:

-
pub enum Upstream {
-    Udp(SocketAddr),
-    Doh { url: String, client: reqwest::Client },
-}
-

If the configured address starts with https://, it’s -DoH. Otherwise, plain UDP. Simple, no toggles.

-

“Why not just use dnsmasq -+ nginx + mkcert?”

-

You absolutely can — those are mature, battle-tested tools. The -difference is integration: with dnsmasq + nginx + mkcert, you’re -configuring three tools with three config formats. Numa puts the DNS -record, reverse proxy, and TLS cert behind one API call:

-
curl -X POST localhost:5380/services -d '{"name":"frontend","target_port":5173}'
-

That creates the DNS entry, generates a TLS certificate, and starts -proxying — including WebSocket upgrade for Vite HMR. One command, no -config files. Having full control over the resolution pipeline is what -makes auto-revert overrides and LAN discovery possible.

-

What I learned

-

DNS is a 40-year-old protocol that works remarkably -well. The wire format is tight, the caching model is elegant, -and the hierarchical delegation system has scaled to billions of queries -per day. The things people complain about (DNSSEC complexity, lack of -encryption) are extensions bolted on decades later, not flaws in the -original design.

-

The hard parts aren’t where you’d expect. Parsing -the wire protocol was straightforward (RFC 1035 is well-written). The -hard parts were: browsers rejecting wildcard certs under single-label -TLDs, macOS resolver quirks (scutil vs -/etc/resolv.conf), and getting multiple processes to bind -the same multicast port (SO_REUSEPORT on macOS, -SO_REUSEADDR on Linux).

-

Learn the vocabulary before you show up. I initially -called Numa a “DNS resolver” and got corrected — it’s a forwarding -resolver. The distinction matters to people who work with DNS -professionally, and being sloppy about it cost me credibility in my -first community posts.

-

What’s next

-

Update (March 2026): Recursive resolution and DNSSEC -validation are now shipped. Numa resolves from root nameservers with -full chain-of-trust verification (RSA/SHA-256, ECDSA P-256, Ed25519) and -NSEC/NSEC3 authenticated denial of existence.

-

Read the follow-up: -Implementing DNSSEC from Scratch in Rust →

-

Still on the roadmap:

- -

github.com/razvandimescu/numa

-
- - - - - diff --git a/site/blog/dnssec-from-scratch.html b/site/blog/dnssec-from-scratch.html deleted file mode 100644 index 547dbac..0000000 --- a/site/blog/dnssec-from-scratch.html +++ /dev/null @@ -1,646 +0,0 @@ - - - - - -Implementing DNSSEC from Scratch in Rust — Numa - - - - - - - - -
-
-

Implementing DNSSEC from Scratch in Rust

- -
- -

In the previous post I -covered how DNS works at the wire level — packet format, label -compression, TTL caching, DoH. Numa was a forwarding resolver: it parsed -packets, did useful things locally, and relayed the rest to Cloudflare -or Quad9.

-

That post ended with “recursive resolution and DNSSEC are on the -roadmap.” This post is about building both.

-

The short version: Numa now resolves from root nameservers with -iterative queries, validates the full DNSSEC chain of trust, and -cryptographically proves that non-existent domains don’t exist. No -upstream dependency. No DNS libraries. Just ring for the -crypto primitives and a lot of RFC reading.

-

Why recursive?

-

A forwarding resolver trusts its upstream. When you ask Quad9 for -cloudflare.com, you trust that Quad9 returns the real -answer. If Quad9 lies, gets compromised, or is legally compelled to -redirect you — you have no way to know.

-

A recursive resolver doesn’t trust anyone. It starts at the root -nameservers (operated by 12 independent organizations) and follows the -delegation chain: root → .com TLD → -cloudflare.com authoritative servers. Each server only -answers for its own zone. No single entity sees your full query -pattern.

-

DNSSEC adds cryptographic proof to each step. The root signs -.com’s key. .com signs -cloudflare.com’s key. cloudflare.com signs its -own records. If any step is tampered with, the chain breaks and Numa -rejects the response.

-

The iterative resolution -loop

-

Recursive resolution is a misnomer — the resolver actually uses -iterative queries. It asks root “where is -cloudflare.com?”, root says “I don’t know, but here are the -.com nameservers.” It asks .com, which says -“here are cloudflare’s nameservers.” It asks those, and gets the -answer.

-
resolve("cloudflare.com", A)
-  → ask 198.41.0.4 (a.root-servers.net)
-    ← "try .com: ns1.gtld-servers.net (192.5.6.30)"  [referral + glue]
-  → ask 192.5.6.30 (ns1.gtld-servers.net)
-    ← "try cloudflare: ns1.cloudflare.com (173.245.58.51)"  [referral + glue]
-  → ask 173.245.58.51 (ns1.cloudflare.com)
-    ← "104.16.132.229"  [answer]
-

The implementation (src/recursive.rs) is a loop with -three possible outcomes per query:

-
    -
  1. Answer — the server knows the record. Cache it, -return it.
  2. -
  3. Referral — the server delegates to another zone. -Extract NS records and glue (A/AAAA records for the nameservers, -included in the additional section to avoid a chicken-and-egg problem), -then query the next server.
  4. -
  5. NXDOMAIN/REFUSED — the name doesn’t exist or the -server refuses. Cache the negative result.
  6. -
-

CNAME chasing adds complexity: if you ask for -www.cloudflare.com and get a CNAME to -cloudflare.com, you need to restart resolution for the new -name. I cap this at 8 levels.

-

TLD priming

-

Cold-cache resolution is slow. Every query needs root → TLD → -authoritative, each with its own network round-trip. For the first query -to example.com, that’s three serial UDP round-trips before -you get an answer.

-

TLD priming solves this. On startup, Numa queries root for NS records -of 34 common TLDs (.com, .org, -.net, .io, .dev, plus EU ccTLDs), -caching NS records, glue addresses, DS records, and DNSKEY records. -After priming, the first query to any .com domain skips -root entirely — it already knows where .com’s nameservers -are, and already has the DNSSEC keys needed to validate the -response.

-

DNSSEC chain of trust

-

DNSSEC doesn’t encrypt DNS traffic. It signs it. Every DNS -record can have an accompanying RRSIG (signature) record. The resolver -verifies the signature against the zone’s DNSKEY, then verifies that -DNSKEY against the parent zone’s DS (delegation signer) record, walking -up until it reaches the root trust anchor — a hardcoded public key that -IANA publishes and the entire internet agrees on.

-
cloudflare.com A 104.16.132.229
-  signed by → RRSIG (key_tag=34505, algo=13, signer=cloudflare.com)
-  verified with → DNSKEY (cloudflare.com, key_tag=34505, ECDSA P-256)
-  vouched for by → DS (at .com, key_tag=2371, digest=SHA-256 of cloudflare's DNSKEY)
-  signed by → RRSIG (key_tag=19718, signer=com)
-  verified with → DNSKEY (com, key_tag=19718)
-  vouched for by → DS (at root, key_tag=30909)
-  signed by → RRSIG (signer=.)
-  verified with → DNSKEY (., key_tag=20326)  ← root trust anchor (hardcoded)
-

How keys get there

-

The domain owner generates the DNSKEY keypair — typically their DNS -provider (Cloudflare, etc.) does this. The owner then submits the DS -record (a hash of their DNSKEY) to their registrar (Namecheap, GoDaddy), -who passes it to the registry (Verisign for .com). The -registry signs it into the TLD zone, and IANA signs the TLD’s DS into -the root. Trust flows up; keys flow down.

-

The irony: you “own” your DNSSEC keys, but your registrar controls -whether the DS record gets published. If they remove it — by mistake, by -policy, or by court order — your DNSSEC chain breaks silently.

-

The trust anchor

-

IANA’s root KSK (Key Signing Key) has key tag 20326, algorithm 8 -(RSA/SHA-256), and a 256-byte public key. It was last rolled in 2018. I -hardcode it as a const array — this is the one thing in the -entire system that requires out-of-band trust.

-
const ROOT_KSK_PUBLIC_KEY: &[u8] = &[
-    0x03, 0x01, 0x00, 0x01, 0xac, 0xff, 0xb4, 0x09,
-    // ... 256 bytes total
-];
-

When IANA rolls this key (rare — the previous key lasted from 2010 to -2018), every DNSSEC validator on the internet needs updating. For Numa, -that means a binary update. Something to watch.

-

Every DNSKEY has a key tag — a 16-bit checksum over its RDATA (RFC -4034 Appendix B). The first test I wrote: compute the root KSK’s key tag -and assert it equals 20326. Instant confidence that the RDATA encoding -is correct.

-

The crypto

-

Numa uses ring for all cryptographic operations. Three -algorithms cover the vast majority of signed zones:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
AlgorithmIDUsageVerify time
RSA/SHA-2568Root, most TLDs10.9 µs
ECDSA P-25613Cloudflare, many modern zones174 ns
Ed2551915Newer zones~200 ns
-

RSA key format conversion

-

DNS stores RSA public keys in RFC 3110 format (exponent length, -exponent, modulus). ring expects PKCS#1 DER (ASN.1 -encoded). Converting between them means writing a minimal ASN.1 encoder -with leading-zero stripping and sign-bit padding. Getting this wrong -produces keys that ring silently rejects — one of the -harder bugs to track down.

-

ECDSA is simpler

-

ECDSA P-256 keys in DNS are 64 bytes (x + y coordinates). -ring expects uncompressed point format: 0x04 -prefix + 64 bytes. One line:

-
let mut uncompressed = Vec::with_capacity(65);
-uncompressed.push(0x04);
-uncompressed.extend_from_slice(public_key);  // 64 bytes from DNS
-

Signatures are also 64 bytes (r + s), used directly. No format -conversion needed.

-

Building the signed data

-

RRSIG verification doesn’t sign the DNS packet — it signs a canonical -form of the records. Building this correctly is the most -detail-sensitive part of DNSSEC. The signed data is:

-
    -
  1. RRSIG RDATA fields (type covered, algorithm, labels, original TTL, -expiration, inception, key tag, signer name) — without the -signature itself
  2. -
  3. For each record in the RRset: owner name (lowercased, uncompressed) -+ type + class + original TTL (from the RRSIG, not the record’s current -TTL) + RDATA length + canonical RDATA
  4. -
-

The records must be sorted by their canonical wire-format -representation. Owner names must be lowercased. The TTL must be the -original TTL from the RRSIG, not the decremented TTL from -caching.

-

Getting any of these details wrong — wrong TTL, wrong case, wrong -sort order, wrong RDATA encoding — produces a valid-looking but -incorrect signed data blob, and ring returns a signature -mismatch with no diagnostic information. I spent more time debugging -signed data construction than any other part of DNSSEC.

-

Proving a name doesn’t exist

-

Verifying that cloudflare.com has a valid A record is -one thing. Proving that doesnotexist.cloudflare.com -doesn’t exist — cryptographically, in a way that can’t be -forged — is harder.

-

NSEC

-

NSEC records form a chain. Each NSEC says “the next name in this zone -after me is X, and at my name these record types exist.” If you query -beta.example.com and the zone has -alpha.example.com → NSEC → gamma.example.com, the gap -proves beta doesn’t exist — there’s nothing between -alpha and gamma.

-

For NXDOMAIN proofs, RFC 4035 §5.4 requires two things: 1. An NSEC -record whose gap covers the queried name 2. An NSEC record proving no -wildcard exists at the closest encloser

-

The canonical DNS name ordering (RFC 4034 §6.1) compares labels -right-to-left, case-insensitive. a.example.com < -b.example.com because at the example.com level -they’re equal, then a < b. But -z.example.com < a.example.org because -.com < .org at the TLD level.

-

NSEC3

-

NSEC3 solves NSEC’s zone enumeration problem — with NSEC, you can -walk the chain and discover every name in the zone. NSEC3 hashes the -names first (iterated SHA-1 with a salt), so the NSEC3 chain reveals -hashes, not names.

-

The proof is a 3-part closest encloser proof (RFC 5155 §8.4): find an -ancestor whose hash matches an NSEC3 owner, prove the next-closer name -falls within a hash range gap, and prove the wildcard at the closest -encloser also falls within a gap. All three must hold, or the denial is -rejected.

-

I cap NSEC3 iterations at 500 (RFC 9276 recommends 0). Higher -iteration counts are a DoS vector — each verification requires -iterations + 1 SHA-1 hashes.

-

Making it fast

-

Cold-cache DNSSEC validation initially required ~5 network fetches -per query (DNSKEY for each zone in the chain, plus DS records). Three -optimizations brought this down to ~1:

-

TLD priming (startup) — fetch root DNSKEY + each -TLD’s NS/DS/DNSKEY. After priming, the trust chain from root to any -.com zone is fully cached.

-

Referral DS piggybacking — when a TLD server refers -you to cloudflare.com’s nameservers, the authority section -often includes DS records for the child zone. Cache them during -resolution instead of fetching separately during validation.

-

DNSKEY prefetch — before the validation loop, scan -all RRSIGs for signer zones and batch-fetch any missing DNSKEYs. This -avoids serial DNSKEY fetches inside the per-RRset verification loop.

-

Result: a cold-cache query for cloudflare.com with full -DNSSEC validation takes ~90ms. The TLD chain is already warm; only one -DNSKEY fetch is needed (for cloudflare.com itself).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
OperationTime
ECDSA P-256 verify174 ns
Ed25519 verify~200 ns
RSA/SHA-256 verify10.9 µs
DS digest (SHA-256)257 ns
Key tag computation20–63 ns
Cold-cache validation (1 fetch)~90 ms
-

The network fetch dominates. The crypto is noise.

-

Surviving hostile networks

-

I deployed Numa as my system DNS and switched to a different network. -Everything broke. Every query: SERVFAIL, 3-second timeout.

-

The network probe told the story: the ISP blocks outbound UDP port 53 -to all servers except a handful of whitelisted public resolvers (Google, -Cloudflare). Root servers, TLD servers, authoritative servers — all -unreachable over UDP. The ISP forces you onto their DNS or a blessed -upstream. Recursive resolution is impossible.

-

Except TCP port 53 worked fine. And every DNS server is required to -support TCP (RFC 1035 section 4.2.2). The ISP apparently only filters -UDP.

-

The fix has three parts:

-

TCP fallback. Every outbound query tries UDP first -(800ms timeout). If UDP fails or the response is truncated, retry -immediately over TCP. TCP uses a 2-byte length prefix before the DNS -message — trivial to implement, and it handles DNSSEC responses that -exceed the UDP payload limit.

-

UDP auto-disable. After 3 consecutive UDP failures, -flip a global AtomicBool and skip UDP entirely — go -TCP-first for all queries. This avoids burning 800ms per hop on a -network where UDP will never work. The flag resets when the network -changes (detected via LAN IP monitoring).

-

Query minimization (RFC 7816). When querying root -servers, send only the TLD — com instead of -secret-project.example.com. Root servers handle trillions -of queries and are operated by 12 organizations. Minimization reduces -what they learn from yours.

-

The result: on a network that blocks UDP:53, Numa detects the block -within the first 3 queries, switches to TCP, and resolves normally at -300-500ms per cold query. Cached queries remain 0ms. No manual config -change needed — switch networks and it adapts.

-

I wouldn’t have found this without dogfooding. The code worked -perfectly on my home network. It took a real hostile network to expose -the assumption that UDP always works.

-

What I learned

-

DNSSEC is a verification system, not an encryption -system. It proves authenticity — this record was signed by the -zone owner. It doesn’t hide what you’re querying. For privacy, you still -need encrypted transport (DoH/DoT) or recursive resolution (no single -upstream).

-

The hardest bugs are in data serialization, not -crypto. ring either verifies or it doesn’t — a -binary answer. But getting the signed data blob exactly right (correct -TTL, correct case, correct sort, correct RDATA encoding for each record -type) requires extreme precision. A single wrong byte means verification -fails with no hint about what’s wrong.

-

Negative proofs are harder than positive proofs. -Verifying a record exists: verify one RRSIG. Proving a record doesn’t -exist: find the right NSEC/NSEC3 records, verify their RRSIGs, check gap -coverage, check wildcard denial, compute hashes. The NSEC3 closest -encloser proof alone has three sub-proofs, each requiring hash -computation and range checking.

-

Performance optimization is about avoiding network, not -avoiding CPU. The crypto takes nanoseconds to microseconds. The -network fetch takes tens of milliseconds. Every optimization that -matters — TLD priming, DS piggybacking, DNSKEY prefetch — is about -eliminating a round trip, not speeding up a hash.

-

What’s next

- -

The code is at github.com/razvandimescu/numa -— the DNSSEC validation is in src/dnssec.rs -and the recursive resolver in src/recursive.rs. -MIT license.

-
- - - - - diff --git a/site/blog/index.html b/site/blog/index.html index dafa814..f19149c 100644 --- a/site/blog/index.html +++ b/site/blog/index.html @@ -168,14 +168,14 @@ body::before {

Blog