feat: recursive DNS + DNSSEC + TCP fallback #17

Merged
razvandimescu merged 10 commits from feat/recursive-resolution into main 2026-03-28 10:03:47 +08:00
11 changed files with 1372 additions and 103 deletions
Showing only changes of commit 5b2cc874a1 - Show all commits

View File

@@ -8,7 +8,7 @@ I wanted to understand how DNS actually works. Not the "it translates domain nam
So I built one from scratch in Rust. No `hickory-dns`, no `trust-dns`, no `simple-dns`. The entire RFC 1035 wire protocol — headers, labels, compression pointers, record types — parsed and serialized by hand. It started as a weekend learning project, became a side project I kept coming back to over 6 years, and eventually turned into [Numa](https://github.com/razvandimescu/numa) — which I now use as my actual system DNS.
A note on terminology before we go further: Numa is currently a *forwarding* resolver — it parses and caches DNS packets, but forwards queries to an upstream (Quad9, Cloudflare, or any DoH provider) rather than walking the delegation chain from root servers itself. Think of it as a smart proxy that does useful things with your DNS traffic locally (caching, ad blocking, overrides, local service domains) before forwarding what it can't answer. Full recursive resolution — where Numa talks directly to root and authoritative nameservers — is on the roadmap, along with DNSSEC validation.
A note on terminology: Numa supports two resolution modes. *Forward* mode relays queries to an upstream (Quad9, Cloudflare, or any DoH provider). *Recursive* mode walks the delegation chain from root servers itself — iterative queries to root, TLD, and authoritative nameservers, with full DNSSEC validation. In both modes, Numa does useful things with your DNS traffic locally (caching, ad blocking, overrides, local service domains) before resolving what it can't answer. This post covers the wire protocol and forwarding path; [the next post](/blog/dnssec-from-scratch.html) covers recursive resolution and DNSSEC.
Here's what surprised me along the way.
@@ -315,14 +315,13 @@ That creates the DNS entry, generates a TLS certificate, and starts proxying —
## What's next
Numa is at v0.5.0 with DNS forwarding, caching, ad blocking, DNS-over-HTTPS, .numa local domains with auto TLS, and LAN service discovery.
**Update (March 2026):** Recursive resolution and DNSSEC validation are now shipped. Numa resolves from root nameservers with full chain-of-trust verification (RSA/SHA-256, ECDSA P-256, Ed25519) and NSEC/NSEC3 authenticated denial of existence.
On the roadmap:
**[Read the follow-up: Implementing DNSSEC from Scratch in Rust →](/blog/dnssec-from-scratch.html)**
Still on the roadmap:
- **DoT (DNS-over-TLS)** — DoH was first because it passes through captive portals and corporate firewalls (port 443 vs 853). DoT has less framing overhead, so it's faster. Both will be available.
- **Recursive resolution** — walk the delegation chain from root servers instead of forwarding. Combined with DNSSEC validation, this removes the need to trust any upstream resolver.
- **[pkarr](https://github.com/pubky/pkarr) integration** — self-sovereign DNS via the Mainline BitTorrent DHT. Publish DNS records signed with your Ed25519 key, no registrar needed.
But those are rabbit holes for future posts.
[github.com/razvandimescu/numa](https://github.com/razvandimescu/numa)

246
blog/dnssec-from-scratch.md Normal file
View File

@@ -0,0 +1,246 @@
---
title: Implementing DNSSEC from Scratch in Rust
description: Recursive resolution from root hints, chain-of-trust validation, NSEC/NSEC3 denial proofs, and what I learned implementing DNSSEC with zero DNS libraries.
date: March 2026
---
In the [previous post](/blog/dns-from-scratch.html) I covered how DNS works at the wire level — packet format, label compression, TTL caching, DoH. Numa was a forwarding resolver: it parsed packets, did useful things locally, and relayed the rest to Cloudflare or Quad9.
That post ended with "recursive resolution and DNSSEC are on the roadmap." This post is about building both.
The short version: Numa now resolves from root nameservers with iterative queries, validates the full DNSSEC chain of trust, and cryptographically proves that non-existent domains don't exist. No upstream dependency. No DNS libraries. Just `ring` for the crypto primitives and a lot of RFC reading.
## Why recursive?
A forwarding resolver trusts its upstream. When you ask Quad9 for `cloudflare.com`, you trust that Quad9 returns the real answer. If Quad9 lies, gets compromised, or is legally compelled to redirect you — you have no way to know.
A recursive resolver doesn't trust anyone. It starts at the root nameservers (operated by 12 independent organizations) and follows the delegation chain: root → `.com` TLD → `cloudflare.com` authoritative servers. Each server only answers for its own zone. No single entity sees your full query pattern.
DNSSEC adds cryptographic proof to each step. The root signs `.com`'s key. `.com` signs `cloudflare.com`'s key. `cloudflare.com` signs its own records. If any step is tampered with, the chain breaks and Numa rejects the response.
## The iterative resolution loop
Recursive resolution is a misnomer — the resolver actually uses *iterative* queries. It asks root "where is `cloudflare.com`?", root says "I don't know, but here are the `.com` nameservers." It asks `.com`, which says "here are cloudflare's nameservers." It asks those, and gets the answer.
```
resolve("cloudflare.com", A)
→ ask 198.41.0.4 (a.root-servers.net)
← "try .com: ns1.gtld-servers.net (192.5.6.30)" [referral + glue]
→ ask 192.5.6.30 (ns1.gtld-servers.net)
← "try cloudflare: ns1.cloudflare.com (173.245.58.51)" [referral + glue]
→ ask 173.245.58.51 (ns1.cloudflare.com)
← "104.16.132.229" [answer]
```
The implementation (`src/recursive.rs`) is a loop with three possible outcomes per query:
1. **Answer** — the server knows the record. Cache it, return it.
2. **Referral** — the server delegates to another zone. Extract NS records and glue (A/AAAA records for the nameservers, included in the additional section to avoid a chicken-and-egg problem), then query the next server.
3. **NXDOMAIN/REFUSED** — the name doesn't exist or the server refuses. Cache the negative result.
CNAME chasing adds complexity: if you ask for `www.cloudflare.com` and get a CNAME to `cloudflare.com`, you need to restart resolution for the new name. I cap this at 8 levels.
### TLD priming
Cold-cache resolution is slow. Every query needs root → TLD → authoritative, each with its own network round-trip. For the first query to `example.com`, that's three serial UDP round-trips before you get an answer.
TLD priming solves this. On startup, Numa queries root for NS records of 34 common TLDs (`.com`, `.org`, `.net`, `.io`, `.dev`, plus EU ccTLDs), caching NS records, glue addresses, DS records, and DNSKEY records. After priming, the first query to any `.com` domain skips root entirely — it already knows where `.com`'s nameservers are, and already has the DNSSEC keys needed to validate the response.
## DNSSEC chain of trust
DNSSEC doesn't encrypt DNS traffic. It *signs* it. Every DNS record can have an accompanying RRSIG (signature) record. The resolver verifies the signature against the zone's DNSKEY, then verifies that DNSKEY against the parent zone's DS (delegation signer) record, walking up until it reaches the root trust anchor — a hardcoded public key that IANA publishes and the entire internet agrees on.
```
cloudflare.com A 104.16.132.229
signed by → RRSIG (key_tag=34505, algo=13, signer=cloudflare.com)
verified with → DNSKEY (cloudflare.com, key_tag=34505, ECDSA P-256)
vouched for by → DS (at .com, key_tag=2371, digest=SHA-256 of cloudflare's DNSKEY)
signed by → RRSIG (key_tag=19718, signer=com)
verified with → DNSKEY (com, key_tag=19718)
vouched for by → DS (at root, key_tag=30909)
signed by → RRSIG (signer=.)
verified with → DNSKEY (., key_tag=20326) ← root trust anchor (hardcoded)
```
### The trust anchor
IANA's root KSK (Key Signing Key) has key tag 20326, algorithm 8 (RSA/SHA-256), and a 256-byte public key. It was last rolled in 2018. I hardcode it as a `const` array — this is the one thing in the entire system that requires out-of-band trust.
```rust
const ROOT_KSK_PUBLIC_KEY: &[u8] = &[
0x03, 0x01, 0x00, 0x01, 0xac, 0xff, 0xb4, 0x09,
// ... 256 bytes total
];
```
When IANA rolls this key (rare — the previous key lasted from 2010 to 2018), every DNSSEC validator on the internet needs updating. For Numa, that means a binary update. Something to watch.
### Key tag computation
Every DNSKEY has a key tag — a 16-bit identifier computed per RFC 4034 Appendix B. It's a simple checksum over the DNSKEY RDATA (flags + protocol + algorithm + public key), summing 16-bit words with carry:
```rust
pub fn compute_key_tag(flags: u16, protocol: u8, algorithm: u8, public_key: &[u8]) -> u16 {
let mut rdata = Vec::with_capacity(4 + public_key.len());
rdata.push((flags >> 8) as u8);
rdata.push((flags & 0xFF) as u8);
rdata.push(protocol);
rdata.push(algorithm);
rdata.extend_from_slice(public_key);
let mut ac: u32 = 0;
for (i, &byte) in rdata.iter().enumerate() {
if i % 2 == 0 { ac += (byte as u32) << 8; }
else { ac += byte as u32; }
}
ac += (ac >> 16) & 0xFFFF;
(ac & 0xFFFF) as u16
}
```
The first test I wrote: compute the root KSK's key tag and assert it equals 20326. Instant confidence that the RDATA encoding is correct.
## The crypto
Numa uses `ring` for all cryptographic operations. Three algorithms cover the vast majority of signed zones:
| Algorithm | ID | Usage | Verify time |
|---|---|---|---|
| RSA/SHA-256 | 8 | Root, most TLDs | 10.9 µs |
| ECDSA P-256 | 13 | Cloudflare, many modern zones | 174 ns |
| Ed25519 | 15 | Newer zones | ~200 ns |
### RSA key format conversion
DNS stores RSA public keys in RFC 3110 format: exponent length (1 or 3 bytes), exponent, modulus. `ring` expects PKCS#1 DER (ASN.1 encoded). Converting between them means writing a minimal ASN.1 encoder:
```rust
fn rsa_dnskey_to_der(public_key: &[u8]) -> Option<Vec<u8>> {
// Parse RFC 3110: [exp_len] [exponent] [modulus]
let (exp_len, exp_start) = if public_key[0] == 0 {
let len = u16::from_be_bytes([public_key[1], public_key[2]]) as usize;
(len, 3)
} else {
(public_key[0] as usize, 1)
};
let exponent = &public_key[exp_start..exp_start + exp_len];
let modulus = &public_key[exp_start + exp_len..];
// Build ASN.1 DER: SEQUENCE { INTEGER modulus, INTEGER exponent }
let mod_der = asn1_integer(modulus);
let exp_der = asn1_integer(exponent);
// ... wrap in SEQUENCE tag + length
}
```
The `asn1_integer` function handles leading-zero stripping (DER integers must be minimal) and sign-bit padding (high bit set means negative in ASN.1, so positive numbers need a `0x00` prefix). Getting this wrong produces keys that `ring` silently rejects — one of the harder bugs to track down.
### ECDSA is simpler
ECDSA P-256 keys in DNS are 64 bytes (x + y coordinates). `ring` expects uncompressed point format: `0x04` prefix + 64 bytes. One line:
```rust
let mut uncompressed = Vec::with_capacity(65);
uncompressed.push(0x04);
uncompressed.extend_from_slice(public_key); // 64 bytes from DNS
```
Signatures are also 64 bytes (r + s), used directly. No format conversion needed.
### Building the signed data
RRSIG verification doesn't sign the DNS packet — it signs a canonical form of the records. Building this correctly is the most detail-sensitive part of DNSSEC. The signed data is:
1. RRSIG RDATA fields (type covered, algorithm, labels, original TTL, expiration, inception, key tag, signer name) — *without* the signature itself
2. For each record in the RRset: owner name (lowercased, uncompressed) + type + class + original TTL (from the RRSIG, not the record's current TTL) + RDATA length + canonical RDATA
The records must be sorted by their canonical wire-format representation. Owner names must be lowercased. The TTL must be the *original* TTL from the RRSIG, not the decremented TTL from caching.
Getting any of these details wrong — wrong TTL, wrong case, wrong sort order, wrong RDATA encoding — produces a valid-looking but incorrect signed data blob, and `ring` returns a signature mismatch with no diagnostic information. I spent more time debugging signed data construction than any other part of DNSSEC.
## Proving a name doesn't exist
Verifying that `cloudflare.com` has a valid A record is one thing. Proving that `doesnotexist.cloudflare.com` *doesn't* exist — cryptographically, in a way that can't be forged — is harder.
### NSEC
NSEC records form a chain. Each NSEC says "the next name in this zone after me is X, and at my name these record types exist." If you query `beta.example.com` and the zone has `alpha.example.com → NSEC → gamma.example.com`, the gap proves `beta` doesn't exist — there's nothing between `alpha` and `gamma`.
For NXDOMAIN proofs, RFC 4035 §5.4 requires two things:
1. An NSEC record whose gap covers the queried name
2. An NSEC record proving no wildcard exists at the closest encloser
The canonical DNS name ordering (RFC 4034 §6.1) compares labels right-to-left, case-insensitive. `a.example.com` < `b.example.com` because at the `example.com` level they're equal, then `a` < `b`. But `z.example.com` < `a.example.org` because `.com` < `.org` at the TLD level.
### NSEC3
NSEC3 solves NSEC's zone enumeration problem — with NSEC, you can walk the chain and discover every name in the zone. NSEC3 hashes the names first (iterated SHA-1 with a salt), so the NSEC3 chain reveals hashes, not names.
The proof is a 3-part closest encloser proof (RFC 5155 §8.4):
1. **Closest encloser** — find an ancestor of the queried name whose hash exactly matches an NSEC3 owner
2. **Next closer** — the name one label longer than the closest encloser must fall within an NSEC3 hash range (proving it doesn't exist)
3. **Wildcard denial** — the wildcard at the closest encloser (`*.closest_encloser`) must also fall within an NSEC3 hash range
```rust
// Pre-compute hashes for all ancestors
for i in 0..labels.len() {
let name: String = labels[i..].join(".");
ancestor_hashes.push(nsec3_hash(&name, algorithm, iterations, salt));
}
// Walk from longest candidate: is this the closest encloser?
for i in 1..labels.len() {
let ce_hash = &ancestor_hashes[i];
if !decoded.iter().any(|(oh, _)| oh == ce_hash) { continue; } // (1)
let nc_hash = &ancestor_hashes[i - 1];
if !nsec3_any_covers(&decoded, nc_hash) { continue; } // (2)
let wc = format!("*.{}", labels[i..].join("."));
let wc_hash = nsec3_hash(&wc, algorithm, iterations, salt)?;
if nsec3_any_covers(&decoded, &wc_hash) { proven = true; break; } // (3)
}
```
I cap NSEC3 iterations at 500 (RFC 9276 recommends 0). Higher iteration counts are a DoS vector — each verification requires `iterations + 1` SHA-1 hashes.
## Making it fast
Cold-cache DNSSEC validation initially required ~5 network fetches per query (DNSKEY for each zone in the chain, plus DS records). Three optimizations brought this down to ~1:
**TLD priming** (startup) — fetch root DNSKEY + each TLD's NS/DS/DNSKEY. After priming, the trust chain from root to any `.com` zone is fully cached.
**Referral DS piggybacking** — when a TLD server refers you to `cloudflare.com`'s nameservers, the authority section often includes DS records for the child zone. Cache them during resolution instead of fetching separately during validation.
**DNSKEY prefetch** — before the validation loop, scan all RRSIGs for signer zones and batch-fetch any missing DNSKEYs. This avoids serial DNSKEY fetches inside the per-RRset verification loop.
Result: a cold-cache query for `cloudflare.com` with full DNSSEC validation takes ~90ms. The TLD chain is already warm; only one DNSKEY fetch is needed (for `cloudflare.com` itself).
| Operation | Time |
|---|---|
| ECDSA P-256 verify | 174 ns |
| Ed25519 verify | ~200 ns |
| RSA/SHA-256 verify | 10.9 µs |
| DS digest (SHA-256) | 257 ns |
| Key tag computation | 2063 ns |
| Cold-cache validation (1 fetch) | ~90 ms |
The network fetch dominates. The crypto is noise.
## What I learned
**DNSSEC is a verification system, not an encryption system.** It proves authenticity — this record was signed by the zone owner. It doesn't hide what you're querying. For privacy, you still need encrypted transport (DoH/DoT) or recursive resolution (no single upstream).
**The hardest bugs are in data serialization, not crypto.** `ring` either verifies or it doesn't — a binary answer. But getting the signed data blob exactly right (correct TTL, correct case, correct sort, correct RDATA encoding for each record type) requires extreme precision. A single wrong byte means verification fails with no hint about what's wrong.
**Negative proofs are harder than positive proofs.** Verifying a record exists: verify one RRSIG. Proving a record doesn't exist: find the right NSEC/NSEC3 records, verify their RRSIGs, check gap coverage, check wildcard denial, compute hashes. The NSEC3 closest encloser proof alone has three sub-proofs, each requiring hash computation and range checking.
**Performance optimization is about avoiding network, not avoiding CPU.** The crypto takes nanoseconds to microseconds. The network fetch takes tens of milliseconds. Every optimization that matters — TLD priming, DS piggybacking, DNSKEY prefetch — is about eliminating a round trip, not speeding up a hash.
## What's next
Numa now has 13 feature layers, from basic DNS forwarding through full recursive DNSSEC resolution. The immediate roadmap:
- **DoT (DNS-over-TLS)** — the last encrypted transport we don't support
- **[pkarr](https://github.com/pubky/pkarr) integration** — self-sovereign DNS via the Mainline BitTorrent DHT. Ed25519-signed DNS records published without a registrar.
- **Global `.numa` names** — human-readable names backed by DHT, not ICANN
The code is at [github.com/razvandimescu/numa](https://github.com/razvandimescu/numa). MIT license. The entire DNSSEC implementation is in [`src/dnssec.rs`](https://github.com/razvandimescu/numa/blob/main/src/dnssec.rs) (~1,600 lines) and [`src/recursive.rs`](https://github.com/razvandimescu/numa/blob/main/src/recursive.rs) (~600 lines).

View File

@@ -301,15 +301,16 @@ parsed and serialized by hand. It started as a weekend learning project,
became a side project I kept coming back to over 6 years, and eventually
turned into <a href="https://github.com/razvandimescu/numa">Numa</a>
which I now use as my actual system DNS.</p>
<p>A note on terminology before we go further: Numa is currently a
<em>forwarding</em> resolver — it parses and caches DNS packets, but
forwards queries to an upstream (Quad9, Cloudflare, or any DoH provider)
rather than walking the delegation chain from root servers itself. Think
of it as a smart proxy that does useful things with your DNS traffic
locally (caching, ad blocking, overrides, local service domains) before
forwarding what it cant answer. Full recursive resolution — where Numa
talks directly to root and authoritative nameservers — is on the
roadmap, along with DNSSEC validation.</p>
<p>A note on terminology: Numa supports two resolution modes.
<em>Forward</em> mode relays queries to an upstream (Quad9, Cloudflare,
or any DoH provider). <em>Recursive</em> mode walks the delegation chain
from root servers itself — iterative queries to root, TLD, and
authoritative nameservers, with full DNSSEC validation. In both modes,
Numa does useful things with your DNS traffic locally (caching, ad
blocking, overrides, local service domains) before resolving what it
cant answer. This post covers the wire protocol and forwarding path; <a
href="/blog/dnssec-from-scratch.html">the next post</a> covers recursive
resolution and DNSSEC.</p>
<p>Heres what surprised me along the way.</p>
<h2 id="what-does-a-dns-packet-actually-look-like">What does a DNS
packet actually look like?</h2>
@@ -619,24 +620,23 @@ resolver. The distinction matters to people who work with DNS
professionally, and being sloppy about it cost me credibility in my
first community posts.</p>
<h2 id="whats-next">Whats next</h2>
<p>Numa is at v0.5.0 with DNS forwarding, caching, ad blocking,
DNS-over-HTTPS, .numa local domains with auto TLS, and LAN service
discovery.</p>
<p>On the roadmap:</p>
<p><strong>Update (March 2026):</strong> Recursive resolution and DNSSEC
validation are now shipped. Numa resolves from root nameservers with
full chain-of-trust verification (RSA/SHA-256, ECDSA P-256, Ed25519) and
NSEC/NSEC3 authenticated denial of existence.</p>
<p><strong><a href="/blog/dnssec-from-scratch.html">Read the follow-up:
Implementing DNSSEC from Scratch in Rust →</a></strong></p>
<p>Still on the roadmap:</p>
<ul>
<li><strong>DoT (DNS-over-TLS)</strong> — DoH was first because it
passes through captive portals and corporate firewalls (port 443 vs
853). DoT has less framing overhead, so its faster. Both will be
available.</li>
<li><strong>Recursive resolution</strong> — walk the delegation chain
from root servers instead of forwarding. Combined with DNSSEC
validation, this removes the need to trust any upstream resolver.</li>
<li><strong><a href="https://github.com/pubky/pkarr">pkarr</a>
integration</strong> — self-sovereign DNS via the Mainline BitTorrent
DHT. Publish DNS records signed with your Ed25519 key, no registrar
needed.</li>
</ul>
<p>But those are rabbit holes for future posts.</p>
<p><a
href="https://github.com/razvandimescu/numa">github.com/razvandimescu/numa</a></p>
</article>

View File

@@ -0,0 +1,665 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Implementing DNSSEC from Scratch in Rust — Numa</title>
<meta name="description" content="Recursive resolution from root hints,
chain-of-trust validation, NSEC/NSEC3 denial proofs, and what I learned
implementing DNSSEC with zero DNS libraries.">
<link rel="stylesheet" href="/fonts/fonts.css">
<style>
*, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }
:root {
--bg-deep: #f5f0e8;
--bg-surface: #ece5da;
--bg-elevated: #e3dbce;
--bg-card: #faf7f2;
--amber: #c0623a;
--amber-dim: #9e4e2d;
--teal: #6b7c4e;
--teal-dim: #566540;
--violet: #64748b;
--text-primary: #2c2418;
--text-secondary: #6b5e4f;
--text-dim: #a39888;
--border: rgba(0, 0, 0, 0.08);
--border-amber: rgba(192, 98, 58, 0.22);
--font-display: 'Instrument Serif', Georgia, serif;
--font-body: 'DM Sans', system-ui, sans-serif;
--font-mono: 'JetBrains Mono', monospace;
}
html { scroll-behavior: smooth; }
body {
background: var(--bg-deep);
color: var(--text-primary);
font-family: var(--font-body);
font-weight: 400;
line-height: 1.7;
-webkit-font-smoothing: antialiased;
}
body::before {
content: '';
position: fixed;
inset: 0;
background-image: url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.9' numOctaves='4' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23n)' opacity='0.025'/%3E%3C/svg%3E");
pointer-events: none;
z-index: 9999;
}
/* --- Blog nav --- */
.blog-nav {
padding: 1.5rem 2rem;
display: flex;
align-items: center;
gap: 1.5rem;
}
.blog-nav a {
font-family: var(--font-mono);
font-size: 0.75rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
text-decoration: none;
transition: color 0.2s;
}
.blog-nav a:hover { color: var(--amber); }
.blog-nav .wordmark {
font-family: var(--font-display);
font-size: 1.4rem;
font-weight: 400;
color: var(--text-primary);
text-decoration: none;
letter-spacing: -0.02em;
}
.blog-nav .wordmark:hover { color: var(--amber); }
.blog-nav .sep {
color: var(--text-dim);
font-family: var(--font-mono);
font-size: 0.75rem;
}
/* --- Article --- */
.article {
max-width: 720px;
margin: 0 auto;
padding: 3rem 2rem 6rem;
}
.article-header {
margin-bottom: 3rem;
padding-bottom: 2rem;
border-bottom: 1px solid var(--border);
}
.article-header h1 {
font-family: var(--font-display);
font-weight: 400;
font-size: clamp(2rem, 5vw, 3rem);
line-height: 1.15;
margin-bottom: 1rem;
color: var(--text-primary);
}
.article-meta {
font-family: var(--font-mono);
font-size: 0.75rem;
color: var(--text-dim);
letter-spacing: 0.04em;
}
.article-meta a {
color: var(--amber);
text-decoration: none;
}
.article-meta a:hover { text-decoration: underline; }
/* --- Prose --- */
.article h2 {
font-family: var(--font-display);
font-weight: 600;
font-size: 1.8rem;
line-height: 1.2;
margin: 3rem 0 1rem;
color: var(--text-primary);
}
.article h3 {
font-family: var(--font-body);
font-weight: 600;
font-size: 1.2rem;
margin: 2rem 0 0.75rem;
color: var(--text-primary);
}
.article p {
margin-bottom: 1.25rem;
color: var(--text-secondary);
font-size: 1.05rem;
}
.article a {
color: var(--amber);
text-decoration: underline;
text-decoration-color: rgba(192, 98, 58, 0.3);
text-underline-offset: 2px;
transition: text-decoration-color 0.2s;
}
.article a:hover {
text-decoration-color: var(--amber);
}
.article strong {
color: var(--text-primary);
font-weight: 600;
}
.article ul, .article ol {
margin-bottom: 1.25rem;
padding-left: 1.5rem;
color: var(--text-secondary);
}
.article li {
margin-bottom: 0.4rem;
font-size: 1.05rem;
}
.article blockquote {
border-left: 3px solid var(--amber);
padding: 0.75rem 1.25rem;
margin: 1.5rem 0;
background: rgba(192, 98, 58, 0.04);
border-radius: 0 4px 4px 0;
}
.article blockquote p {
color: var(--text-secondary);
font-style: italic;
margin-bottom: 0;
}
/* --- Code --- */
.article code {
font-family: var(--font-mono);
font-size: 0.88em;
background: var(--bg-elevated);
padding: 0.15em 0.4em;
border-radius: 3px;
color: var(--amber-dim);
}
.article pre {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 6px;
padding: 1.25rem 1.5rem;
margin: 1.5rem 0;
overflow-x: auto;
line-height: 1.55;
}
.article pre code {
background: none;
padding: 0;
border-radius: 0;
color: var(--text-primary);
font-size: 0.85rem;
}
/* --- Images --- */
.article img {
max-width: 100%;
border-radius: 6px;
border: 1px solid var(--border);
margin: 1.5rem 0;
}
/* --- Tables --- */
.article table {
width: 100%;
border-collapse: collapse;
margin: 1.5rem 0;
font-size: 0.95rem;
}
.article th {
font-family: var(--font-mono);
font-size: 0.75rem;
letter-spacing: 0.06em;
text-transform: uppercase;
color: var(--text-dim);
text-align: left;
padding: 0.6rem 1rem;
border-bottom: 2px solid var(--border);
}
.article td {
padding: 0.6rem 1rem;
border-bottom: 1px solid var(--border);
color: var(--text-secondary);
}
/* --- Footer --- */
.blog-footer {
text-align: center;
padding: 3rem 2rem;
border-top: 1px solid var(--border);
max-width: 720px;
margin: 0 auto;
}
.blog-footer a {
font-family: var(--font-mono);
font-size: 0.75rem;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
text-decoration: none;
margin: 0 1rem;
}
.blog-footer a:hover { color: var(--amber); }
/* --- Responsive --- */
@media (max-width: 640px) {
.article { padding: 2rem 1.25rem 4rem; }
.article pre { padding: 1rem; margin-left: -0.5rem; margin-right: -0.5rem; border-radius: 0; border-left: none; border-right: none; }
}
</style>
</head>
<body>
<nav class="blog-nav">
<a href="/" class="wordmark">Numa</a>
<span class="sep">/</span>
<a href="/blog/">Blog</a>
</nav>
<article class="article">
<header class="article-header">
<h1>Implementing DNSSEC from Scratch in Rust</h1>
<div class="article-meta">
March 2026 · <a href="https://dimescu.ro">Razvan Dimescu</a>
</div>
</header>
<p>In the <a href="/blog/dns-from-scratch.html">previous post</a> I
covered how DNS works at the wire level — packet format, label
compression, TTL caching, DoH. Numa was a forwarding resolver: it parsed
packets, did useful things locally, and relayed the rest to Cloudflare
or Quad9.</p>
<p>That post ended with “recursive resolution and DNSSEC are on the
roadmap.” This post is about building both.</p>
<p>The short version: Numa now resolves from root nameservers with
iterative queries, validates the full DNSSEC chain of trust, and
cryptographically proves that non-existent domains dont exist. No
upstream dependency. No DNS libraries. Just <code>ring</code> for the
crypto primitives and a lot of RFC reading.</p>
<h2 id="why-recursive">Why recursive?</h2>
<p>A forwarding resolver trusts its upstream. When you ask Quad9 for
<code>cloudflare.com</code>, you trust that Quad9 returns the real
answer. If Quad9 lies, gets compromised, or is legally compelled to
redirect you — you have no way to know.</p>
<p>A recursive resolver doesnt trust anyone. It starts at the root
nameservers (operated by 12 independent organizations) and follows the
delegation chain: root → <code>.com</code> TLD →
<code>cloudflare.com</code> authoritative servers. Each server only
answers for its own zone. No single entity sees your full query
pattern.</p>
<p>DNSSEC adds cryptographic proof to each step. The root signs
<code>.com</code>s key. <code>.com</code> signs
<code>cloudflare.com</code>s key. <code>cloudflare.com</code> signs its
own records. If any step is tampered with, the chain breaks and Numa
rejects the response.</p>
<h2 id="the-iterative-resolution-loop">The iterative resolution
loop</h2>
<p>Recursive resolution is a misnomer — the resolver actually uses
<em>iterative</em> queries. It asks root “where is
<code>cloudflare.com</code>?”, root says “I dont know, but here are the
<code>.com</code> nameservers.” It asks <code>.com</code>, which says
“here are cloudflares nameservers.” It asks those, and gets the
answer.</p>
<pre><code>resolve(&quot;cloudflare.com&quot;, A)
→ ask 198.41.0.4 (a.root-servers.net)
&quot;try .com: ns1.gtld-servers.net (192.5.6.30)&quot; [referral + glue]
→ ask 192.5.6.30 (ns1.gtld-servers.net)
&quot;try cloudflare: ns1.cloudflare.com (173.245.58.51)&quot; [referral + glue]
→ ask 173.245.58.51 (ns1.cloudflare.com)
&quot;104.16.132.229&quot; [answer]</code></pre>
<p>The implementation (<code>src/recursive.rs</code>) is a loop with
three possible outcomes per query:</p>
<ol type="1">
<li><strong>Answer</strong> — the server knows the record. Cache it,
return it.</li>
<li><strong>Referral</strong> — the server delegates to another zone.
Extract NS records and glue (A/AAAA records for the nameservers,
included in the additional section to avoid a chicken-and-egg problem),
then query the next server.</li>
<li><strong>NXDOMAIN/REFUSED</strong> — the name doesnt exist or the
server refuses. Cache the negative result.</li>
</ol>
<p>CNAME chasing adds complexity: if you ask for
<code>www.cloudflare.com</code> and get a CNAME to
<code>cloudflare.com</code>, you need to restart resolution for the new
name. I cap this at 8 levels.</p>
<h3 id="tld-priming">TLD priming</h3>
<p>Cold-cache resolution is slow. Every query needs root → TLD →
authoritative, each with its own network round-trip. For the first query
to <code>example.com</code>, thats three serial UDP round-trips before
you get an answer.</p>
<p>TLD priming solves this. On startup, Numa queries root for NS records
of 34 common TLDs (<code>.com</code>, <code>.org</code>,
<code>.net</code>, <code>.io</code>, <code>.dev</code>, plus EU ccTLDs),
caching NS records, glue addresses, DS records, and DNSKEY records.
After priming, the first query to any <code>.com</code> domain skips
root entirely — it already knows where <code>.com</code>s nameservers
are, and already has the DNSSEC keys needed to validate the
response.</p>
<h2 id="dnssec-chain-of-trust">DNSSEC chain of trust</h2>
<p>DNSSEC doesnt encrypt DNS traffic. It <em>signs</em> it. Every DNS
record can have an accompanying RRSIG (signature) record. The resolver
verifies the signature against the zones DNSKEY, then verifies that
DNSKEY against the parent zones DS (delegation signer) record, walking
up until it reaches the root trust anchor — a hardcoded public key that
IANA publishes and the entire internet agrees on.</p>
<pre><code>cloudflare.com A 104.16.132.229
signed by → RRSIG (key_tag=34505, algo=13, signer=cloudflare.com)
verified with → DNSKEY (cloudflare.com, key_tag=34505, ECDSA P-256)
vouched for by → DS (at .com, key_tag=2371, digest=SHA-256 of cloudflare&#39;s DNSKEY)
signed by → RRSIG (key_tag=19718, signer=com)
verified with → DNSKEY (com, key_tag=19718)
vouched for by → DS (at root, key_tag=30909)
signed by → RRSIG (signer=.)
verified with → DNSKEY (., key_tag=20326) ← root trust anchor (hardcoded)</code></pre>
<h3 id="the-trust-anchor">The trust anchor</h3>
<p>IANAs root KSK (Key Signing Key) has key tag 20326, algorithm 8
(RSA/SHA-256), and a 256-byte public key. It was last rolled in 2018. I
hardcode it as a <code>const</code> array — this is the one thing in the
entire system that requires out-of-band trust.</p>
<div class="sourceCode" id="cb3"><pre
class="sourceCode rust"><code class="sourceCode rust"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">const</span> ROOT_KSK_PUBLIC_KEY<span class="op">:</span> <span class="op">&amp;</span>[<span class="dt">u8</span>] <span class="op">=</span> <span class="op">&amp;</span>[</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="dv">0x03</span><span class="op">,</span> <span class="dv">0x01</span><span class="op">,</span> <span class="dv">0x00</span><span class="op">,</span> <span class="dv">0x01</span><span class="op">,</span> <span class="dv">0xac</span><span class="op">,</span> <span class="dv">0xff</span><span class="op">,</span> <span class="dv">0xb4</span><span class="op">,</span> <span class="dv">0x09</span><span class="op">,</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="co">// ... 256 bytes total</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>]<span class="op">;</span></span></code></pre></div>
<p>When IANA rolls this key (rare — the previous key lasted from 2010 to
2018), every DNSSEC validator on the internet needs updating. For Numa,
that means a binary update. Something to watch.</p>
<h3 id="key-tag-computation">Key tag computation</h3>
<p>Every DNSKEY has a key tag — a 16-bit identifier computed per RFC
4034 Appendix B. Its a simple checksum over the DNSKEY RDATA (flags +
protocol + algorithm + public key), summing 16-bit words with carry:</p>
<div class="sourceCode" id="cb4"><pre
class="sourceCode rust"><code class="sourceCode rust"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">pub</span> <span class="kw">fn</span> compute_key_tag(flags<span class="op">:</span> <span class="dt">u16</span><span class="op">,</span> protocol<span class="op">:</span> <span class="dt">u8</span><span class="op">,</span> algorithm<span class="op">:</span> <span class="dt">u8</span><span class="op">,</span> public_key<span class="op">:</span> <span class="op">&amp;</span>[<span class="dt">u8</span>]) <span class="op">-&gt;</span> <span class="dt">u16</span> <span class="op">{</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> <span class="kw">mut</span> rdata <span class="op">=</span> <span class="dt">Vec</span><span class="pp">::</span>with_capacity(<span class="dv">4</span> <span class="op">+</span> public_key<span class="op">.</span>len())<span class="op">;</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> rdata<span class="op">.</span>push((flags <span class="op">&gt;&gt;</span> <span class="dv">8</span>) <span class="kw">as</span> <span class="dt">u8</span>)<span class="op">;</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> rdata<span class="op">.</span>push((flags <span class="op">&amp;</span> <span class="dv">0xFF</span>) <span class="kw">as</span> <span class="dt">u8</span>)<span class="op">;</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> rdata<span class="op">.</span>push(protocol)<span class="op">;</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a> rdata<span class="op">.</span>push(algorithm)<span class="op">;</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> rdata<span class="op">.</span>extend_from_slice(public_key)<span class="op">;</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> <span class="kw">mut</span> ac<span class="op">:</span> <span class="dt">u32</span> <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (i<span class="op">,</span> <span class="op">&amp;</span>byte) <span class="kw">in</span> rdata<span class="op">.</span>iter()<span class="op">.</span>enumerate() <span class="op">{</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> i <span class="op">%</span> <span class="dv">2</span> <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span> ac <span class="op">+=</span> (byte <span class="kw">as</span> <span class="dt">u32</span>) <span class="op">&lt;&lt;</span> <span class="dv">8</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a> <span class="cf">else</span> <span class="op">{</span> ac <span class="op">+=</span> byte <span class="kw">as</span> <span class="dt">u32</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> ac <span class="op">+=</span> (ac <span class="op">&gt;&gt;</span> <span class="dv">16</span>) <span class="op">&amp;</span> <span class="dv">0xFFFF</span><span class="op">;</span></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a> (ac <span class="op">&amp;</span> <span class="dv">0xFFFF</span>) <span class="kw">as</span> <span class="dt">u16</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>The first test I wrote: compute the root KSKs key tag and assert it
equals 20326. Instant confidence that the RDATA encoding is correct.</p>
<h2 id="the-crypto">The crypto</h2>
<p>Numa uses <code>ring</code> for all cryptographic operations. Three
algorithms cover the vast majority of signed zones:</p>
<table>
<thead>
<tr>
<th>Algorithm</th>
<th>ID</th>
<th>Usage</th>
<th>Verify time</th>
</tr>
</thead>
<tbody>
<tr>
<td>RSA/SHA-256</td>
<td>8</td>
<td>Root, most TLDs</td>
<td>10.9 µs</td>
</tr>
<tr>
<td>ECDSA P-256</td>
<td>13</td>
<td>Cloudflare, many modern zones</td>
<td>174 ns</td>
</tr>
<tr>
<td>Ed25519</td>
<td>15</td>
<td>Newer zones</td>
<td>~200 ns</td>
</tr>
</tbody>
</table>
<h3 id="rsa-key-format-conversion">RSA key format conversion</h3>
<p>DNS stores RSA public keys in RFC 3110 format: exponent length (1 or
3 bytes), exponent, modulus. <code>ring</code> expects PKCS#1 DER (ASN.1
encoded). Converting between them means writing a minimal ASN.1
encoder:</p>
<div class="sourceCode" id="cb5"><pre
class="sourceCode rust"><code class="sourceCode rust"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">fn</span> rsa_dnskey_to_der(public_key<span class="op">:</span> <span class="op">&amp;</span>[<span class="dt">u8</span>]) <span class="op">-&gt;</span> <span class="dt">Option</span><span class="op">&lt;</span><span class="dt">Vec</span><span class="op">&lt;</span><span class="dt">u8</span><span class="op">&gt;&gt;</span> <span class="op">{</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="co">// Parse RFC 3110: [exp_len] [exponent] [modulus]</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (exp_len<span class="op">,</span> exp_start) <span class="op">=</span> <span class="cf">if</span> public_key[<span class="dv">0</span>] <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> len <span class="op">=</span> <span class="dt">u16</span><span class="pp">::</span>from_be_bytes([public_key[<span class="dv">1</span>]<span class="op">,</span> public_key[<span class="dv">2</span>]]) <span class="kw">as</span> <span class="dt">usize</span><span class="op">;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> (len<span class="op">,</span> <span class="dv">3</span>)</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span> <span class="cf">else</span> <span class="op">{</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> (public_key[<span class="dv">0</span>] <span class="kw">as</span> <span class="dt">usize</span><span class="op">,</span> <span class="dv">1</span>)</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> <span class="op">};</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> exponent <span class="op">=</span> <span class="op">&amp;</span>public_key[exp_start<span class="op">..</span>exp_start <span class="op">+</span> exp_len]<span class="op">;</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> modulus <span class="op">=</span> <span class="op">&amp;</span>public_key[exp_start <span class="op">+</span> exp_len<span class="op">..</span>]<span class="op">;</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a> <span class="co">// Build ASN.1 DER: SEQUENCE { INTEGER modulus, INTEGER exponent }</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> mod_der <span class="op">=</span> asn1_integer(modulus)<span class="op">;</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> exp_der <span class="op">=</span> asn1_integer(exponent)<span class="op">;</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a> <span class="co">// ... wrap in SEQUENCE tag + length</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>The <code>asn1_integer</code> function handles leading-zero stripping
(DER integers must be minimal) and sign-bit padding (high bit set means
negative in ASN.1, so positive numbers need a <code>0x00</code> prefix).
Getting this wrong produces keys that <code>ring</code> silently rejects
— one of the harder bugs to track down.</p>
<h3 id="ecdsa-is-simpler">ECDSA is simpler</h3>
<p>ECDSA P-256 keys in DNS are 64 bytes (x + y coordinates).
<code>ring</code> expects uncompressed point format: <code>0x04</code>
prefix + 64 bytes. One line:</p>
<div class="sourceCode" id="cb6"><pre
class="sourceCode rust"><code class="sourceCode rust"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">mut</span> uncompressed <span class="op">=</span> <span class="dt">Vec</span><span class="pp">::</span>with_capacity(<span class="dv">65</span>)<span class="op">;</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>uncompressed<span class="op">.</span>push(<span class="dv">0x04</span>)<span class="op">;</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>uncompressed<span class="op">.</span>extend_from_slice(public_key)<span class="op">;</span> <span class="co">// 64 bytes from DNS</span></span></code></pre></div>
<p>Signatures are also 64 bytes (r + s), used directly. No format
conversion needed.</p>
<h3 id="building-the-signed-data">Building the signed data</h3>
<p>RRSIG verification doesnt sign the DNS packet — it signs a canonical
form of the records. Building this correctly is the most
detail-sensitive part of DNSSEC. The signed data is:</p>
<ol type="1">
<li>RRSIG RDATA fields (type covered, algorithm, labels, original TTL,
expiration, inception, key tag, signer name) — <em>without</em> the
signature itself</li>
<li>For each record in the RRset: owner name (lowercased, uncompressed)
+ type + class + original TTL (from the RRSIG, not the records current
TTL) + RDATA length + canonical RDATA</li>
</ol>
<p>The records must be sorted by their canonical wire-format
representation. Owner names must be lowercased. The TTL must be the
<em>original</em> TTL from the RRSIG, not the decremented TTL from
caching.</p>
<p>Getting any of these details wrong — wrong TTL, wrong case, wrong
sort order, wrong RDATA encoding — produces a valid-looking but
incorrect signed data blob, and <code>ring</code> returns a signature
mismatch with no diagnostic information. I spent more time debugging
signed data construction than any other part of DNSSEC.</p>
<h2 id="proving-a-name-doesnt-exist">Proving a name doesnt exist</h2>
<p>Verifying that <code>cloudflare.com</code> has a valid A record is
one thing. Proving that <code>doesnotexist.cloudflare.com</code>
<em>doesnt</em> exist — cryptographically, in a way that cant be
forged — is harder.</p>
<h3 id="nsec">NSEC</h3>
<p>NSEC records form a chain. Each NSEC says “the next name in this zone
after me is X, and at my name these record types exist.” If you query
<code>beta.example.com</code> and the zone has
<code>alpha.example.com → NSEC → gamma.example.com</code>, the gap
proves <code>beta</code> doesnt exist — theres nothing between
<code>alpha</code> and <code>gamma</code>.</p>
<p>For NXDOMAIN proofs, RFC 4035 §5.4 requires two things: 1. An NSEC
record whose gap covers the queried name 2. An NSEC record proving no
wildcard exists at the closest encloser</p>
<p>The canonical DNS name ordering (RFC 4034 §6.1) compares labels
right-to-left, case-insensitive. <code>a.example.com</code> &lt;
<code>b.example.com</code> because at the <code>example.com</code> level
theyre equal, then <code>a</code> &lt; <code>b</code>. But
<code>z.example.com</code> &lt; <code>a.example.org</code> because
<code>.com</code> &lt; <code>.org</code> at the TLD level.</p>
<h3 id="nsec3">NSEC3</h3>
<p>NSEC3 solves NSECs zone enumeration problem — with NSEC, you can
walk the chain and discover every name in the zone. NSEC3 hashes the
names first (iterated SHA-1 with a salt), so the NSEC3 chain reveals
hashes, not names.</p>
<p>The proof is a 3-part closest encloser proof (RFC 5155 §8.4): 1.
<strong>Closest encloser</strong> — find an ancestor of the queried name
whose hash exactly matches an NSEC3 owner 2. <strong>Next
closer</strong> — the name one label longer than the closest encloser
must fall within an NSEC3 hash range (proving it doesnt exist) 3.
<strong>Wildcard denial</strong> — the wildcard at the closest encloser
(<code>*.closest_encloser</code>) must also fall within an NSEC3 hash
range</p>
<div class="sourceCode" id="cb7"><pre
class="sourceCode rust"><code class="sourceCode rust"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co">// Pre-compute hashes for all ancestors</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> i <span class="kw">in</span> <span class="dv">0</span><span class="op">..</span>labels<span class="op">.</span>len() <span class="op">{</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> name<span class="op">:</span> <span class="dt">String</span> <span class="op">=</span> labels[i<span class="op">..</span>]<span class="op">.</span>join(<span class="st">&quot;.&quot;</span>)<span class="op">;</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> ancestor_hashes<span class="op">.</span>push(nsec3_hash(<span class="op">&amp;</span>name<span class="op">,</span> algorithm<span class="op">,</span> iterations<span class="op">,</span> salt))<span class="op">;</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="co">// Walk from longest candidate: is this the closest encloser?</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> i <span class="kw">in</span> <span class="dv">1</span><span class="op">..</span>labels<span class="op">.</span>len() <span class="op">{</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> ce_hash <span class="op">=</span> <span class="op">&amp;</span>ancestor_hashes[i]<span class="op">;</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">!</span>decoded<span class="op">.</span>iter()<span class="op">.</span>any(<span class="op">|</span>(oh<span class="op">,</span> _)<span class="op">|</span> oh <span class="op">==</span> ce_hash) <span class="op">{</span> <span class="cf">continue</span><span class="op">;</span> <span class="op">}</span> <span class="co">// (1)</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> nc_hash <span class="op">=</span> <span class="op">&amp;</span>ancestor_hashes[i <span class="op">-</span> <span class="dv">1</span>]<span class="op">;</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">!</span>nsec3_any_covers(<span class="op">&amp;</span>decoded<span class="op">,</span> nc_hash) <span class="op">{</span> <span class="cf">continue</span><span class="op">;</span> <span class="op">}</span> <span class="co">// (2)</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> wc <span class="op">=</span> <span class="pp">format!</span>(<span class="st">&quot;*.{}&quot;</span><span class="op">,</span> labels[i<span class="op">..</span>]<span class="op">.</span>join(<span class="st">&quot;.&quot;</span>))<span class="op">;</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> wc_hash <span class="op">=</span> nsec3_hash(<span class="op">&amp;</span>wc<span class="op">,</span> algorithm<span class="op">,</span> iterations<span class="op">,</span> salt)<span class="op">?;</span></span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> nsec3_any_covers(<span class="op">&amp;</span>decoded<span class="op">,</span> <span class="op">&amp;</span>wc_hash) <span class="op">{</span> proven <span class="op">=</span> <span class="cn">true</span><span class="op">;</span> <span class="cf">break</span><span class="op">;</span> <span class="op">}</span> <span class="co">// (3)</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>I cap NSEC3 iterations at 500 (RFC 9276 recommends 0). Higher
iteration counts are a DoS vector — each verification requires
<code>iterations + 1</code> SHA-1 hashes.</p>
<h2 id="making-it-fast">Making it fast</h2>
<p>Cold-cache DNSSEC validation initially required ~5 network fetches
per query (DNSKEY for each zone in the chain, plus DS records). Three
optimizations brought this down to ~1:</p>
<p><strong>TLD priming</strong> (startup) — fetch root DNSKEY + each
TLDs NS/DS/DNSKEY. After priming, the trust chain from root to any
<code>.com</code> zone is fully cached.</p>
<p><strong>Referral DS piggybacking</strong> — when a TLD server refers
you to <code>cloudflare.com</code>s nameservers, the authority section
often includes DS records for the child zone. Cache them during
resolution instead of fetching separately during validation.</p>
<p><strong>DNSKEY prefetch</strong> — before the validation loop, scan
all RRSIGs for signer zones and batch-fetch any missing DNSKEYs. This
avoids serial DNSKEY fetches inside the per-RRset verification loop.</p>
<p>Result: a cold-cache query for <code>cloudflare.com</code> with full
DNSSEC validation takes ~90ms. The TLD chain is already warm; only one
DNSKEY fetch is needed (for <code>cloudflare.com</code> itself).</p>
<table>
<thead>
<tr>
<th>Operation</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr>
<td>ECDSA P-256 verify</td>
<td>174 ns</td>
</tr>
<tr>
<td>Ed25519 verify</td>
<td>~200 ns</td>
</tr>
<tr>
<td>RSA/SHA-256 verify</td>
<td>10.9 µs</td>
</tr>
<tr>
<td>DS digest (SHA-256)</td>
<td>257 ns</td>
</tr>
<tr>
<td>Key tag computation</td>
<td>2063 ns</td>
</tr>
<tr>
<td>Cold-cache validation (1 fetch)</td>
<td>~90 ms</td>
</tr>
</tbody>
</table>
<p>The network fetch dominates. The crypto is noise.</p>
<h2 id="what-i-learned">What I learned</h2>
<p><strong>DNSSEC is a verification system, not an encryption
system.</strong> It proves authenticity — this record was signed by the
zone owner. It doesnt hide what youre querying. For privacy, you still
need encrypted transport (DoH/DoT) or recursive resolution (no single
upstream).</p>
<p><strong>The hardest bugs are in data serialization, not
crypto.</strong> <code>ring</code> either verifies or it doesnt — a
binary answer. But getting the signed data blob exactly right (correct
TTL, correct case, correct sort, correct RDATA encoding for each record
type) requires extreme precision. A single wrong byte means verification
fails with no hint about whats wrong.</p>
<p><strong>Negative proofs are harder than positive proofs.</strong>
Verifying a record exists: verify one RRSIG. Proving a record doesnt
exist: find the right NSEC/NSEC3 records, verify their RRSIGs, check gap
coverage, check wildcard denial, compute hashes. The NSEC3 closest
encloser proof alone has three sub-proofs, each requiring hash
computation and range checking.</p>
<p><strong>Performance optimization is about avoiding network, not
avoiding CPU.</strong> The crypto takes nanoseconds to microseconds. The
network fetch takes tens of milliseconds. Every optimization that
matters — TLD priming, DS piggybacking, DNSKEY prefetch — is about
eliminating a round trip, not speeding up a hash.</p>
<h2 id="whats-next">Whats next</h2>
<p>Numa now has 13 feature layers, from basic DNS forwarding through
full recursive DNSSEC resolution. The immediate roadmap:</p>
<ul>
<li><strong>DoT (DNS-over-TLS)</strong> — the last encrypted transport
we dont support</li>
<li><strong><a href="https://github.com/pubky/pkarr">pkarr</a>
integration</strong> — self-sovereign DNS via the Mainline BitTorrent
DHT. Ed25519-signed DNS records published without a registrar.</li>
<li><strong>Global <code>.numa</code> names</strong> — human-readable
names backed by DHT, not ICANN</li>
</ul>
<p>The code is at <a
href="https://github.com/razvandimescu/numa">github.com/razvandimescu/numa</a>.
MIT license. The entire DNSSEC implementation is in <a
href="https://github.com/razvandimescu/numa/blob/main/src/dnssec.rs"><code>src/dnssec.rs</code></a>
(~1,600 lines) and <a
href="https://github.com/razvandimescu/numa/blob/main/src/recursive.rs"><code>src/recursive.rs</code></a>
(~600 lines).</p>
</article>
<footer class="blog-footer">
<a href="https://github.com/razvandimescu/numa">GitHub</a>
<a href="/">Home</a>
<a href="/blog/">Blog</a>
</footer>
</body>
</html>

View File

@@ -167,6 +167,13 @@ body::before {
<main class="blog-index">
<h1>Blog</h1>
<ul class="post-list">
<li>
<a href="/blog/dnssec-from-scratch.html">
<div class="post-title">Implementing DNSSEC from Scratch in Rust</div>
<div class="post-desc">Recursive resolution from root hints, chain-of-trust validation, NSEC/NSEC3 denial proofs, and what I learned implementing DNSSEC with zero DNS libraries.</div>
<div class="post-date">March 2026</div>
</a>
</li>
<li>
<a href="/blog/dns-from-scratch.html">
<div class="post-title">I Built a DNS Resolver from Scratch in Rust</div>

View File

@@ -1300,7 +1300,9 @@ footer .closing {
<span class="pipeline-arrow">&rarr;</span>
<div class="pipeline-node"><div class="pipeline-box">Cache</div></div>
<span class="pipeline-arrow">&rarr;</span>
<div class="pipeline-node"><div class="pipeline-box hl-violet">DoH Upstream</div></div>
<div class="pipeline-node"><div class="pipeline-box hl-violet">Recursive / Forward (DoH)</div></div>
<span class="pipeline-arrow">&rarr;</span>
<div class="pipeline-node"><div class="pipeline-box highlight">DNSSEC Validate</div></div>
<span class="pipeline-arrow">&rarr;</span>
<div class="pipeline-node"><div class="pipeline-box hl-emerald">Respond</div></div>
</div>
@@ -1525,6 +1527,14 @@ footer .closing {
<div class="perf-stat-value amber">0 allocations</div>
<div class="perf-stat-label">Heap allocations in the I/O path &mdash; 4KB stack buffers, inline serialization</div>
</div>
<div class="perf-stat">
<div class="perf-stat-value teal">174 ns</div>
<div class="perf-stat-label">ECDSA P-256 signature verification (DNSSEC). RSA/SHA-256: 10.9&micro;s. DS digest: 257ns.</div>
</div>
<div class="perf-stat">
<div class="perf-stat-value emerald">~90 ms</div>
<div class="perf-stat-label">Cold-cache DNSSEC validation &mdash; only 1 network fetch needed (TLD chain pre-warmed on startup)</div>
</div>
<p class="perf-note">
Cold queries match system resolver speed &mdash; the bottleneck is upstream RTT, not Numa. We don't claim to be faster when the network is the limit.
@@ -1554,17 +1564,20 @@ footer .closing {
<dt>DNS Libraries</dt>
<dd>Zero &mdash; wire protocol parsed from scratch</dd>
<dt>Resolution Modes</dt>
<dd>Recursive (iterative from root hints, CNAME chasing, glue extraction) or Forward (DoH / plain UDP)</dd>
<dt>DNSSEC</dt>
<dd>Chain-of-trust via ring &mdash; RSA/SHA-256, ECDSA P-256, Ed25519. NSEC/NSEC3 denial proofs. EDNS0 DO bit, 1232-byte payload (DNS Flag Day 2020).</dd>
<dt>Dependencies</dt>
<dd>18 runtime crates &mdash; tokio, axum, hyper, reqwest (DoH), rcgen + rustls (TLS), socket2 (multicast), serde, and more</dd>
<dd>19 runtime crates &mdash; tokio, axum, hyper, ring (DNSSEC), reqwest (DoH), rcgen + rustls (TLS), socket2 (multicast), serde, and more</dd>
<dt>Packet Format</dt>
<dd>RFC 1035 compliant, 4096-byte UDP (EDNS)</dd>
<dd>RFC 1035 compliant. EDNS0 OPT pseudo-record. Parses A, AAAA, NS, CNAME, MX, SOA, SRV, HTTPS, DNSKEY, DS, RRSIG, NSEC, NSEC3.</dd>
<dt>Concurrency</dt>
<dd>Arc&lt;ServerCtx&gt; + RwLock for reads, Mutex for writes (never across .await)</dd>
<dt>Upstream</dt>
<dd>DNS-over-HTTPS (DoH) via reqwest + http2 + rustls</dd>
</dl>
<div class="code-block reveal reveal-delay-2">
<span class="comment"># Install (pick one)</span>

View File

@@ -176,7 +176,7 @@ fn default_upstream_port() -> u16 {
53
}
fn default_timeout_ms() -> u64 {
3000
5000
}
#[derive(Deserialize)]

View File

@@ -154,7 +154,6 @@ pub async fn handle_query(
&qname,
qtype,
&ctx.cache,
ctx.timeout,
&query,
&ctx.root_hints,
)
@@ -162,28 +161,14 @@ pub async fn handle_query(
{
Ok(resp) => (resp, QueryPath::Recursive),
Err(e) => {
// Auto-fallback: retry via forward upstream if configured
let upstream = ctx.upstream.lock().unwrap().clone();
match forward_query(&query, &upstream, ctx.timeout).await {
Ok(resp) => {
debug!(
"{} | {:?} {} | RECURSIVE FALLBACK → FORWARD | {}",
src_addr, qtype, qname, e
);
ctx.cache.write().unwrap().insert(&qname, qtype, &resp);
(resp, QueryPath::Forwarded)
}
Err(e2) => {
error!(
"{} | {:?} {} | RECURSIVE+FORWARD FAILED | recursive: {} | forward: {}",
src_addr, qtype, qname, e, e2
);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
)
}
}
error!(
"{} | {:?} {} | RECURSIVE ERROR | {}",
src_addr, qtype, qname, e
);
(
DnsPacket::response_from(&query, ResultCode::SERVFAIL),
QueryPath::UpstreamError,
)
}
}
} else {

View File

@@ -74,6 +74,37 @@ pub(crate) async fn forward_udp(
DnsPacket::from_buffer(&mut recv_buffer)
}
/// DNS over TCP (RFC 1035 §4.2.2): 2-byte length prefix, then the DNS message.
pub(crate) async fn forward_tcp(
query: &DnsPacket,
upstream: SocketAddr,
timeout_duration: Duration,
) -> Result<DnsPacket> {
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpStream;
let mut send_buffer = BytePacketBuffer::new();
query.write(&mut send_buffer)?;
let msg = send_buffer.filled();
let mut stream = timeout(timeout_duration, TcpStream::connect(upstream)).await??;
// Write length-prefixed message
stream.write_all(&(msg.len() as u16).to_be_bytes()).await?;
stream.write_all(msg).await?;
// Read length-prefixed response
let mut len_buf = [0u8; 2];
timeout(timeout_duration, stream.read_exact(&mut len_buf)).await??;
let resp_len = u16::from_be_bytes(len_buf) as usize;
let mut data = vec![0u8; resp_len];
timeout(timeout_duration, stream.read_exact(&mut data)).await??;
let mut recv_buffer = BytePacketBuffer::from_bytes(&data);
DnsPacket::from_buffer(&mut recv_buffer)
}
async fn forward_doh(
query: &DnsPacket,
url: &str,

View File

@@ -447,6 +447,7 @@ async fn network_watch_loop(ctx: Arc<numa::ctx::ServerCtx>) {
info!("LAN IP changed: {} → {}", current_ip, new_ip);
*current_ip = new_ip;
changed = true;
numa::recursive::reset_udp_state();
}
}

View File

@@ -4,7 +4,6 @@ use std::sync::RwLock;
use std::time::Duration;
use log::{debug, info};
use tokio::time::timeout;
use crate::cache::DnsCache;
use crate::forward::forward_udp;
@@ -15,9 +14,13 @@ use crate::record::DnsRecord;
const MAX_REFERRAL_DEPTH: u8 = 10;
const MAX_CNAME_DEPTH: u8 = 8;
const NS_QUERY_TIMEOUT: Duration = Duration::from_secs(2);
const NS_QUERY_TIMEOUT: Duration = Duration::from_millis(800);
const TCP_TIMEOUT: Duration = Duration::from_millis(1500);
const UDP_FAIL_THRESHOLD: u8 = 3;
static QUERY_ID: AtomicU16 = AtomicU16::new(1);
static UDP_FAILURES: std::sync::atomic::AtomicU8 = std::sync::atomic::AtomicU8::new(0);
static UDP_DISABLED: std::sync::atomic::AtomicBool = std::sync::atomic::AtomicBool::new(false);
fn next_id() -> u16 {
QUERY_ID.fetch_add(1, Ordering::Relaxed)
@@ -27,17 +30,31 @@ fn dns_addr(ip: impl Into<IpAddr>) -> SocketAddr {
SocketAddr::new(ip.into(), 53)
}
/// Query root servers for common TLDs and cache NS + glue + DNSKEY + DS records.
/// Pre-warms the DNSSEC trust chain so first queries skip chain-walking I/O.
pub fn reset_udp_state() {
UDP_DISABLED.store(false, Ordering::Relaxed);
UDP_FAILURES.store(0, Ordering::Relaxed);
}
pub async fn prime_tld_cache(cache: &RwLock<DnsCache>, root_hints: &[SocketAddr], tlds: &[String]) {
let root_addr = match root_hints.first() {
Some(addr) => *addr,
None => return,
};
if tlds.is_empty() {
if root_hints.is_empty() || tlds.is_empty() {
return;
}
let mut root_addr = root_hints[0];
for hint in root_hints {
info!("prime: probing root {}", hint);
match send_query(".", QueryType::NS, *hint).await {
Ok(_) => {
info!("prime: root {} reachable", hint);
root_addr = *hint;
break;
}
Err(e) => {
info!("prime: root {} failed: {}, trying next", hint, e);
}
}
}
// Fetch root DNSKEY (needed for DNSSEC chain-of-trust terminus)
if let Ok(root_dnskey) = send_query(".", QueryType::DNSKEY, root_addr).await {
cache
@@ -98,19 +115,12 @@ pub async fn resolve_recursive(
qname: &str,
qtype: QueryType,
cache: &RwLock<DnsCache>,
overall_timeout: Duration,
original_query: &DnsPacket,
root_hints: &[SocketAddr],
) -> crate::Result<DnsPacket> {
let mut resp = match timeout(
overall_timeout,
resolve_iterative(qname, qtype, cache, root_hints, 0, 0),
)
.await
{
Ok(result) => result?,
Err(_) => return Err(format!("recursive resolution timed out for {}", qname).into()),
};
// No overall timeout — each hop is bounded by NS_QUERY_TIMEOUT (UDP + TCP fallback),
// and MAX_REFERRAL_DEPTH caps the chain length.
let mut resp = resolve_iterative(qname, qtype, cache, root_hints, 0, 0).await?;
resp.header.id = original_query.header.id;
resp.header.recursion_available = true;
@@ -136,7 +146,7 @@ pub(crate) fn resolve_iterative<'a>(
return Ok(cached);
}
let mut ns_addrs = find_starting_ns(qname, cache, root_hints);
let (mut current_zone, mut ns_addrs) = find_closest_ns(qname, cache, root_hints);
let mut ns_idx = 0;
for _ in 0..MAX_REFERRAL_DEPTH {
@@ -145,12 +155,14 @@ pub(crate) fn resolve_iterative<'a>(
None => return Err("no nameserver available".into()),
};
let (q_name, q_type) = minimize_query(qname, qtype, &current_zone);
debug!(
"recursive: querying {} for {:?} {} (depth {})",
ns_addr, qtype, qname, referral_depth
"recursive: querying {} for {:?} {} (zone: {}, depth {})",
ns_addr, q_type, q_name, current_zone, referral_depth
);
let response = match send_query(qname, qtype, ns_addr).await {
let response = match send_query(q_name, q_type, ns_addr).await {
Ok(r) => r,
Err(e) => {
debug!("recursive: NS {} failed: {}", ns_addr, e);
@@ -159,6 +171,27 @@ pub(crate) fn resolve_iterative<'a>(
}
};
// Minimized query response — treat as referral, not final answer
if (q_type != qtype || !q_name.eq_ignore_ascii_case(qname))
&& (!response.authorities.is_empty() || !response.answers.is_empty())
{
if let Some(zone) = referral_zone(&response) {
current_zone = zone;
}
let mut all_ns = extract_ns_from_records(&response.answers);
if all_ns.is_empty() {
all_ns = extract_ns_names(&response);
}
let new_addrs = resolve_ns_addrs_from_glue(&response, &all_ns, cache);
if !new_addrs.is_empty() {
ns_addrs = new_addrs;
ns_idx = 0;
continue;
}
ns_idx += 1;
continue;
}
if !response.answers.is_empty() {
let has_target = response.answers.iter().any(|r| r.query_type() == qtype);
@@ -201,32 +234,24 @@ pub(crate) fn resolve_iterative<'a>(
}
// Referral — extract NS + glue, cache glue, resolve NS addresses
// Update zone for query minimization
if let Some(zone) = referral_zone(&response) {
current_zone = zone;
}
let ns_names = extract_ns_names(&response);
if ns_names.is_empty() {
return Ok(response);
}
// Cache glue + DS from referral (avoids separate fetch during DNSSEC validation)
let mut new_ns_addrs = Vec::new();
{
let mut cache_w = cache.write().unwrap();
cache_glue(&mut cache_w, &response, &ns_names);
cache_ds_from_authority(&mut cache_w, &response);
}
for ns_name in &ns_names {
let glue = glue_addrs_for(&response, ns_name);
if !glue.is_empty() {
new_ns_addrs.extend_from_slice(&glue);
break;
}
}
let mut new_ns_addrs = resolve_ns_addrs_from_glue(&response, &ns_names, cache);
// If no glue, try cache (A then AAAA) then recursive resolve
if new_ns_addrs.is_empty() {
for ns_name in &ns_names {
new_ns_addrs.extend(addrs_from_cache(cache, ns_name));
if new_ns_addrs.is_empty() && referral_depth < MAX_REFERRAL_DEPTH {
if referral_depth < MAX_REFERRAL_DEPTH {
debug!("recursive: resolving glue-less NS {}", ns_name);
// Try A first, then AAAA
for qt in [QueryType::A, QueryType::AAAA] {
@@ -276,11 +301,13 @@ pub(crate) fn resolve_iterative<'a>(
})
}
fn find_starting_ns(
/// Find the closest cached NS zone and its resolved addresses.
/// Returns (zone_name, ns_addresses). Falls back to (".", root_hints).
fn find_closest_ns(
qname: &str,
cache: &RwLock<DnsCache>,
root_hints: &[SocketAddr],
) -> Vec<SocketAddr> {
) -> (String, Vec<SocketAddr>) {
let guard = cache.read().unwrap();
let mut pos = 0;
@@ -294,12 +321,8 @@ fn find_starting_ns(
if let Some(resp) = guard.lookup(host, qt) {
for rec in &resp.answers {
match rec {
DnsRecord::A { addr, .. } => {
addrs.push(dns_addr(*addr));
}
DnsRecord::AAAA { addr, .. } => {
addrs.push(dns_addr(*addr));
}
DnsRecord::A { addr, .. } => addrs.push(dns_addr(*addr)),
DnsRecord::AAAA { addr, .. } => addrs.push(dns_addr(*addr)),
_ => {}
}
}
@@ -309,7 +332,7 @@ fn find_starting_ns(
}
if !addrs.is_empty() {
debug!("recursive: starting from cached NS for zone '{}'", zone);
return addrs;
return (zone.to_string(), addrs);
}
}
@@ -324,7 +347,70 @@ fn find_starting_ns(
"recursive: starting from root hints ({} servers)",
root_hints.len()
);
root_hints.to_vec()
(".".to_string(), root_hints.to_vec())
}
/// Extract NS hostnames from any record section (answers or authorities).
fn extract_ns_from_records(records: &[DnsRecord]) -> Vec<String> {
records
.iter()
.filter_map(|r| match r {
DnsRecord::NS { host, .. } => Some(host.clone()),
_ => None,
})
.collect()
}
/// Resolve NS addresses from glue records, then cache fallback.
fn resolve_ns_addrs_from_glue(
response: &DnsPacket,
ns_names: &[String],
cache: &RwLock<DnsCache>,
) -> Vec<SocketAddr> {
let mut addrs = Vec::new();
{
let mut cache_w = cache.write().unwrap();
cache_glue(&mut cache_w, response, ns_names);
}
for ns_name in ns_names {
let glue = glue_addrs_for(response, ns_name);
if !glue.is_empty() {
addrs.extend_from_slice(&glue);
break;
}
}
if addrs.is_empty() {
for ns_name in ns_names {
addrs.extend(addrs_from_cache(cache, ns_name));
if !addrs.is_empty() {
break;
}
}
}
addrs
}
fn referral_zone(response: &DnsPacket) -> Option<String> {
response.authorities.iter().find_map(|r| match r {
DnsRecord::NS { domain, .. } => Some(domain.clone()),
_ => None,
})
}
/// RFC 7816 query minimization (conservative): only minimize at root.
fn minimize_query<'a>(
qname: &'a str,
qtype: QueryType,
current_zone: &str,
) -> (&'a str, QueryType) {
if current_zone != "." {
return (qname, qtype);
}
// At root: extract TLD (last label)
match qname.rfind('.') {
Some(dot) if dot > 0 => (&qname[dot + 1..], QueryType::NS),
_ => (qname, qtype),
}
}
fn addrs_from_cache(cache: &RwLock<DnsCache>, name: &str) -> Vec<SocketAddr> {
@@ -461,7 +547,40 @@ async fn send_query(qname: &str, qtype: QueryType, server: SocketAddr) -> crate:
do_bit: true,
..Default::default()
});
forward_udp(&query, server, NS_QUERY_TIMEOUT).await
// Skip IPv6 if the socket can't handle it (bound to 0.0.0.0)
if server.is_ipv6() {
return crate::forward::forward_tcp(&query, server, TCP_TIMEOUT).await;
}
// If UDP has been detected as blocked, go TCP-first
if UDP_DISABLED.load(Ordering::Relaxed) {
return crate::forward::forward_tcp(&query, server, TCP_TIMEOUT).await;
}
match forward_udp(&query, server, NS_QUERY_TIMEOUT).await {
Ok(resp) if resp.header.truncated_message => {
debug!("send_query: truncated from {}, retrying TCP", server);
crate::forward::forward_tcp(&query, server, TCP_TIMEOUT).await
}
Ok(resp) => {
// UDP works — reset failure counter
UDP_FAILURES.store(0, Ordering::Relaxed);
Ok(resp)
}
Err(e) => {
let fails = UDP_FAILURES.fetch_add(1, Ordering::Relaxed) + 1;
if fails >= UDP_FAIL_THRESHOLD && !UDP_DISABLED.load(Ordering::Relaxed) {
UDP_DISABLED.store(true, Ordering::Relaxed);
info!(
"send_query: {} consecutive UDP failures — switching to TCP-first",
fails
);
}
debug!("send_query: UDP failed for {}: {}, trying TCP", server, e);
crate::forward::forward_tcp(&query, server, TCP_TIMEOUT).await
}
}
}
fn extract_cname_target(response: &DnsPacket, qname: &str) -> Option<String> {
@@ -589,13 +708,216 @@ mod tests {
}
#[test]
fn find_starting_ns_falls_back_to_hints() {
fn find_closest_ns_falls_back_to_hints() {
let cache = RwLock::new(DnsCache::new(100, 60, 86400));
let hints = vec![
dns_addr(Ipv4Addr::new(198, 41, 0, 4)),
dns_addr(Ipv4Addr::new(199, 9, 14, 201)),
];
let addrs = find_starting_ns("example.com", &cache, &hints);
let (zone, addrs) = find_closest_ns("example.com", &cache, &hints);
assert_eq!(zone, ".");
assert_eq!(addrs, hints);
}
#[test]
fn minimize_query_from_root() {
// At root, only reveal TLD
let (name, qt) = minimize_query("www.example.com", QueryType::A, ".");
assert_eq!(name, "com");
assert_eq!(qt, QueryType::NS);
}
#[test]
fn minimize_query_beyond_root_sends_full() {
// Beyond root, send full query (conservative minimization)
let (name, qt) = minimize_query("www.example.com", QueryType::A, "com");
assert_eq!(name, "www.example.com");
assert_eq!(qt, QueryType::A);
let (name, qt) = minimize_query("www.example.com", QueryType::A, "example.com");
assert_eq!(name, "www.example.com");
assert_eq!(qt, QueryType::A);
}
#[test]
fn minimize_query_single_label() {
// Single label (e.g., "com") from root — send as-is
let (name, qt) = minimize_query("com", QueryType::NS, ".");
assert_eq!(name, "com");
assert_eq!(qt, QueryType::NS);
}
// ---- Mock DNS server (TCP-only) for fallback tests ----
use crate::buffer::BytePacketBuffer;
use crate::header::ResultCode;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpListener;
/// Spawn a TCP-only DNS server on localhost. Returns the address.
/// The handler receives each query and returns a response packet.
async fn spawn_tcp_dns_server(
handler: impl Fn(&DnsPacket) -> DnsPacket + Send + Sync + 'static,
) -> SocketAddr {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
let handler = std::sync::Arc::new(handler);
tokio::spawn(async move {
loop {
let (mut stream, _) = match listener.accept().await {
Ok(c) => c,
Err(_) => break,
};
let handler = handler.clone();
tokio::spawn(async move {
// Read length-prefixed DNS query
let mut len_buf = [0u8; 2];
if stream.read_exact(&mut len_buf).await.is_err() {
return;
}
let len = u16::from_be_bytes(len_buf) as usize;
let mut data = vec![0u8; len];
if stream.read_exact(&mut data).await.is_err() {
return;
}
let mut buf = BytePacketBuffer::from_bytes(&data);
let query = match DnsPacket::from_buffer(&mut buf) {
Ok(q) => q,
Err(_) => return,
};
let response = handler(&query);
let mut resp_buf = BytePacketBuffer::new();
if response.write(&mut resp_buf).is_err() {
return;
}
let resp_bytes = resp_buf.filled();
let _ = stream
.write_all(&(resp_bytes.len() as u16).to_be_bytes())
.await;
let _ = stream.write_all(resp_bytes).await;
});
}
});
addr
}
/// TCP-only server returns authoritative answer directly.
/// Verifies: UDP fails → TCP fallback → resolves.
#[tokio::test]
async fn tcp_fallback_resolves_when_udp_blocked() {
UDP_DISABLED.store(false, Ordering::Relaxed);
UDP_FAILURES.store(0, Ordering::Relaxed);
let server_addr = spawn_tcp_dns_server(|query| {
let mut resp = DnsPacket::response_from(query, ResultCode::NOERROR);
resp.header.authoritative_answer = true;
if let Some(q) = query.questions.first() {
if q.qtype == QueryType::A || q.qtype == QueryType::NS {
resp.answers.push(DnsRecord::A {
domain: q.name.clone(),
addr: Ipv4Addr::new(10, 0, 0, 1),
ttl: 300,
});
}
}
resp
})
.await;
let result = send_query("test.example.com", QueryType::A, server_addr).await;
let resp = result.expect("should resolve via TCP fallback");
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
assert!(!resp.answers.is_empty());
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::new(10, 0, 0, 1)),
other => panic!("expected A record, got {:?}", other),
}
}
/// Full iterative resolution through TCP-only mock: root referral → authoritative answer.
/// The mock plays both roles (returns referral for NS queries, answer for A queries).
#[tokio::test]
async fn tcp_only_iterative_resolution() {
UDP_DISABLED.store(true, Ordering::Relaxed); // Skip UDP entirely for speed
let server_addr = spawn_tcp_dns_server(|query| {
let q = match query.questions.first() {
Some(q) => q,
None => return DnsPacket::response_from(query, ResultCode::SERVFAIL),
};
if q.qtype == QueryType::NS || q.name == "com" {
// Return referral — NS points back to ourselves (same IP, port 53 in glue
// won't work, but cache will have our address from root_hints)
let mut resp = DnsPacket::new();
resp.header.id = query.header.id;
resp.header.response = true;
resp.header.rescode = ResultCode::NOERROR;
resp.questions = query.questions.clone();
resp.authorities.push(DnsRecord::NS {
domain: "com".into(),
host: "ns1.com".into(),
ttl: 3600,
});
resp
} else {
// Return authoritative answer
let mut resp = DnsPacket::response_from(query, ResultCode::NOERROR);
resp.header.authoritative_answer = true;
resp.answers.push(DnsRecord::A {
domain: q.name.clone(),
addr: Ipv4Addr::new(10, 0, 0, 42),
ttl: 300,
});
resp
}
})
.await;
let result = send_query("hello.example.com", QueryType::A, server_addr).await;
let resp = result.expect("TCP-only send_query should work");
assert_eq!(resp.header.rescode, ResultCode::NOERROR);
match &resp.answers[0] {
DnsRecord::A { addr, .. } => assert_eq!(*addr, Ipv4Addr::new(10, 0, 0, 42)),
other => panic!("expected A, got {:?}", other),
}
}
#[tokio::test]
async fn tcp_fallback_handles_nxdomain() {
UDP_DISABLED.store(false, Ordering::Relaxed);
UDP_FAILURES.store(0, Ordering::Relaxed);
let server_addr = spawn_tcp_dns_server(|query| {
let mut resp = DnsPacket::response_from(query, ResultCode::NXDOMAIN);
resp.header.authoritative_answer = true;
resp
})
.await;
let cache = RwLock::new(DnsCache::new(100, 60, 86400));
let root_hints = vec![server_addr];
let result =
resolve_iterative("nonexistent.test", QueryType::A, &cache, &root_hints, 0, 0).await;
let resp = result.expect("NXDOMAIN should still return a response");
assert_eq!(resp.header.rescode, ResultCode::NXDOMAIN);
assert!(resp.answers.is_empty());
}
#[tokio::test]
async fn udp_auto_disable_resets() {
UDP_DISABLED.store(true, Ordering::Relaxed);
UDP_FAILURES.store(5, Ordering::Relaxed);
reset_udp_state();
assert!(!UDP_DISABLED.load(Ordering::Relaxed));
assert_eq!(UDP_FAILURES.load(Ordering::Relaxed), 0);
}
}