diff --git a/.github/workflows/static.yml b/.github/workflows/static.yml index b0274ce..9db4306 100644 --- a/.github/workflows/static.yml +++ b/.github/workflows/static.yml @@ -31,6 +31,10 @@ jobs: steps: - name: Checkout uses: actions/checkout@v4 + - name: Install pandoc + run: sudo apt-get install -y pandoc + - name: Generate blog HTML + run: make blog - name: Setup Pages uses: actions/configure-pages@v5 - name: Upload artifact diff --git a/.gitignore b/.gitignore index 1b715be..9dcba3d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ /target CLAUDE.md docs/ +site/blog/posts/ diff --git a/Cargo.lock b/Cargo.lock index a8563e2..1367cd3 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -1148,12 +1148,14 @@ dependencies = [ "criterion", "env_logger", "futures", + "http", "http-body-util", "hyper", "hyper-util", "log", "rcgen", "reqwest", + "ring", "rustls", "serde", "serde_json", @@ -1162,6 +1164,7 @@ dependencies = [ "tokio", "tokio-rustls", "toml", + "tower", ] [[package]] diff --git a/Cargo.toml b/Cargo.toml index ea71da7..7143098 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -28,9 +28,12 @@ time = "0.3" rustls = "0.23" tokio-rustls = "0.26" arc-swap = "1" +ring = "0.17" [dev-dependencies] criterion = { version = "0.5", features = ["html_reports"] } +tower = { version = "0.5", features = ["util"] } +http = "1" [[bench]] name = "hot_path" @@ -39,3 +42,7 @@ harness = false [[bench]] name = "throughput" harness = false + +[[bench]] +name = "dnssec" +harness = false diff --git a/Makefile b/Makefile index 643c058..6d73acb 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ -.PHONY: all build lint fmt check audit test bench clean deploy blog +.PHONY: all build lint fmt check audit test coverage bench clean deploy blog -all: lint build +all: lint build test build: cargo build @@ -19,15 +19,18 @@ audit: test: cargo test +coverage: + cargo tarpaulin --skip-clean --out stdout + bench: cargo bench blog: - @mkdir -p site/blog + @mkdir -p site/blog/posts @for f in blog/*.md; do \ name=$$(basename "$$f" .md); \ - pandoc "$$f" --template=site/blog-template.html -o "site/blog/$$name.html"; \ - echo " $$f → site/blog/$$name.html"; \ + pandoc "$$f" --template=site/blog-template.html -o "site/blog/posts/$$name.html"; \ + echo " $$f → site/blog/posts/$$name.html"; \ done clean: diff --git a/README.md b/README.md index 92fa376..2131bb1 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ A portable DNS resolver in a single binary. Block ads on any network, name your local services (`frontend.numa`), and override any hostname with auto-revert — all from your laptop, no cloud account or Raspberry Pi required. -Built from scratch in Rust. Zero DNS libraries. RFC 1035 wire protocol parsed by hand. One ~8MB binary, no PHP, no web server, no database — everything is embedded. +Built from scratch in Rust. Zero DNS libraries. RFC 1035 wire protocol parsed by hand. Recursive resolution from root nameservers with full DNSSEC validation (chain-of-trust + NSEC/NSEC3 denial proofs). One ~8MB binary, no PHP, no web server, no database — everything is embedded. ![Numa dashboard](assets/hero-demo.gif) @@ -135,6 +135,7 @@ bind_addr = "0.0.0.0:53" | Path-based routing | No | No | No | No | Prefix match + strip | | LAN service discovery | No | No | No | No | mDNS, opt-in | | Developer overrides | No | No | No | No | REST API + auto-expiry | +| Recursive resolver | No | No | Cloud only | Cloud only | From root hints, DNSSEC | | Encrypted upstream (DoH) | No (needs cloudflared) | Yes | Cloud only | Cloud only | Native, single binary | | Portable (travels with laptop) | No (appliance) | No (appliance) | Cloud only | Cloud only | Single binary | | Zero config | Complex | Docker/setup | Yes | Yes | Works out of the box | @@ -144,9 +145,11 @@ bind_addr = "0.0.0.0:53" ## How It Works ``` -Query → Overrides → .numa TLD → Blocklist → Local Zones → Cache → Upstream +Query → Overrides → .numa TLD → Blocklist → Local Zones → Cache → Recursive/Forward ``` +Two resolution modes: **forward** (relay to upstream like Quad9/Cloudflare) or **recursive** (resolve from root nameservers — no upstream dependency). Set `mode = "recursive"` in `[upstream]` to resolve independently. + No DNS libraries — no `hickory-dns`, no `trust-dns`. The wire protocol — headers, labels, compression pointers, record types — is parsed and serialized by hand. Runs on `tokio` + `axum`, async per-query task spawning. [Configuration reference](numa.toml) @@ -161,6 +164,8 @@ No DNS libraries — no `hickory-dns`, no `trust-dns`. The wire protocol — hea - [x] Path-based routing — URL prefix routing with optional strip, REST API - [x] LAN service discovery — mDNS auto-discovery (opt-in), cross-machine DNS + proxy - [x] DNS-over-HTTPS — encrypted upstream via DoH (Quad9, Cloudflare, any provider) +- [x] Recursive resolution — resolve from root nameservers, no upstream dependency +- [x] DNSSEC validation — chain-of-trust, NSEC/NSEC3 denial proofs, AD bit (RSA, ECDSA, Ed25519) - [ ] pkarr integration — self-sovereign DNS via Mainline DHT (15M nodes) - [ ] Global `.numa` names — self-publish, DHT-backed, first-come-first-served diff --git a/benches/dnssec.rs b/benches/dnssec.rs new file mode 100644 index 0000000..270710a --- /dev/null +++ b/benches/dnssec.rs @@ -0,0 +1,183 @@ +use criterion::{black_box, criterion_group, criterion_main, Criterion}; + +use numa::dnssec; +use numa::question::QueryType; +use numa::record::DnsRecord; + +// Realistic ECDSA P-256 key (64 bytes) and signature (64 bytes) +fn make_ecdsa_key() -> Vec { + vec![0xAB; 64] +} +fn make_ecdsa_sig() -> Vec { + vec![0xCD; 64] +} + +// Realistic RSA-2048 key (RFC 3110 format: exp_len=3, exp=65537, mod=256 bytes) +fn make_rsa_key() -> Vec { + let mut key = vec![3u8]; // exponent length + key.extend(&[0x01, 0x00, 0x01]); // exponent = 65537 + key.extend(vec![0xFF; 256]); // modulus (256 bytes = 2048 bits) + key +} + +fn make_ed25519_key() -> Vec { + vec![0xEF; 32] +} + +fn make_dnskey(algorithm: u8, public_key: Vec) -> DnsRecord { + DnsRecord::DNSKEY { + domain: "example.com".into(), + flags: 257, + protocol: 3, + algorithm, + public_key, + ttl: 3600, + } +} + +fn make_rrsig(algorithm: u8, signature: Vec) -> DnsRecord { + DnsRecord::RRSIG { + domain: "example.com".into(), + type_covered: QueryType::A.to_num(), + algorithm, + labels: 2, + original_ttl: 300, + expiration: 2000000000, + inception: 1600000000, + key_tag: 12345, + signer_name: "example.com".into(), + signature, + ttl: 300, + } +} + +fn make_rrset() -> Vec { + vec![ + DnsRecord::A { + domain: "example.com".into(), + addr: "93.184.216.34".parse().unwrap(), + ttl: 300, + }, + DnsRecord::A { + domain: "example.com".into(), + addr: "93.184.216.35".parse().unwrap(), + ttl: 300, + }, + ] +} + +fn bench_key_tag(c: &mut Criterion) { + let key = make_rsa_key(); + c.bench_function("key_tag_rsa2048", |b| { + b.iter(|| { + dnssec::compute_key_tag(black_box(257), black_box(3), black_box(8), black_box(&key)) + }) + }); + + let key = make_ecdsa_key(); + c.bench_function("key_tag_ecdsa_p256", |b| { + b.iter(|| { + dnssec::compute_key_tag(black_box(257), black_box(3), black_box(13), black_box(&key)) + }) + }); +} + +fn bench_name_to_wire(c: &mut Criterion) { + c.bench_function("name_to_wire_short", |b| { + b.iter(|| dnssec::name_to_wire(black_box("example.com"))) + }); + c.bench_function("name_to_wire_long", |b| { + b.iter(|| dnssec::name_to_wire(black_box("sub.deep.nested.example.co.uk"))) + }); +} + +fn bench_build_signed_data(c: &mut Criterion) { + let rrsig = make_rrsig(13, make_ecdsa_sig()); + let rrset = make_rrset(); + let rrset_refs: Vec<&DnsRecord> = rrset.iter().collect(); + + c.bench_function("build_signed_data_2_A_records", |b| { + b.iter(|| dnssec::build_signed_data(black_box(&rrsig), black_box(&rrset_refs))) + }); +} + +fn bench_verify_signature(c: &mut Criterion) { + // These will fail verification (keys/sigs are random), but we measure the + // crypto overhead — ring still does the full algorithm before returning error. + let data = vec![0u8; 128]; // typical signed data size + + let rsa_key = make_rsa_key(); + let rsa_sig = vec![0xAA; 256]; // RSA-2048 signature + c.bench_function("verify_rsa_sha256_2048", |b| { + b.iter(|| { + dnssec::verify_signature( + black_box(8), + black_box(&rsa_key), + black_box(&data), + black_box(&rsa_sig), + ) + }) + }); + + let ecdsa_key = make_ecdsa_key(); + let ecdsa_sig = make_ecdsa_sig(); + c.bench_function("verify_ecdsa_p256", |b| { + b.iter(|| { + dnssec::verify_signature( + black_box(13), + black_box(&ecdsa_key), + black_box(&data), + black_box(&ecdsa_sig), + ) + }) + }); + + let ed_key = make_ed25519_key(); + let ed_sig = vec![0xBB; 64]; + c.bench_function("verify_ed25519", |b| { + b.iter(|| { + dnssec::verify_signature( + black_box(15), + black_box(&ed_key), + black_box(&data), + black_box(&ed_sig), + ) + }) + }); +} + +fn bench_ds_verification(c: &mut Criterion) { + let dk = make_dnskey(8, make_rsa_key()); + + // Compute correct DS digest + let owner_wire = dnssec::name_to_wire("example.com"); + let mut dnskey_rdata = vec![1u8, 1, 3, 8]; // flags=257, proto=3, algo=8 + dnskey_rdata.extend(&make_rsa_key()); + let mut input = Vec::new(); + input.extend(&owner_wire); + input.extend(&dnskey_rdata); + let digest = ring::digest::digest(&ring::digest::SHA256, &input); + + let ds = DnsRecord::DS { + domain: "example.com".into(), + key_tag: dnssec::compute_key_tag(257, 3, 8, &make_rsa_key()), + algorithm: 8, + digest_type: 2, + digest: digest.as_ref().to_vec(), + ttl: 86400, + }; + + c.bench_function("verify_ds_sha256", |b| { + b.iter(|| dnssec::verify_ds(black_box(&ds), black_box(&dk), black_box("example.com"))) + }); +} + +criterion_group!( + dnssec_benches, + bench_key_tag, + bench_name_to_wire, + bench_build_signed_data, + bench_verify_signature, + bench_ds_verification, +); +criterion_main!(dnssec_benches); diff --git a/blog/dns-from-scratch.md b/blog/dns-from-scratch.md index 0959fc7..7bf666c 100644 --- a/blog/dns-from-scratch.md +++ b/blog/dns-from-scratch.md @@ -8,7 +8,7 @@ I wanted to understand how DNS actually works. Not the "it translates domain nam So I built one from scratch in Rust. No `hickory-dns`, no `trust-dns`, no `simple-dns`. The entire RFC 1035 wire protocol — headers, labels, compression pointers, record types — parsed and serialized by hand. It started as a weekend learning project, became a side project I kept coming back to over 6 years, and eventually turned into [Numa](https://github.com/razvandimescu/numa) — which I now use as my actual system DNS. -A note on terminology before we go further: Numa is currently a *forwarding* resolver — it parses and caches DNS packets, but forwards queries to an upstream (Quad9, Cloudflare, or any DoH provider) rather than walking the delegation chain from root servers itself. Think of it as a smart proxy that does useful things with your DNS traffic locally (caching, ad blocking, overrides, local service domains) before forwarding what it can't answer. Full recursive resolution — where Numa talks directly to root and authoritative nameservers — is on the roadmap, along with DNSSEC validation. +A note on terminology: Numa supports two resolution modes. *Forward* mode relays queries to an upstream (Quad9, Cloudflare, or any DoH provider). *Recursive* mode walks the delegation chain from root servers itself — iterative queries to root, TLD, and authoritative nameservers, with full DNSSEC validation. In both modes, Numa does useful things with your DNS traffic locally (caching, ad blocking, overrides, local service domains) before resolving what it can't answer. This post covers the wire protocol and forwarding path; [the next post](/blog/posts/dnssec-from-scratch.html) covers recursive resolution and DNSSEC. Here's what surprised me along the way. @@ -315,14 +315,13 @@ That creates the DNS entry, generates a TLS certificate, and starts proxying — ## What's next -Numa is at v0.5.0 with DNS forwarding, caching, ad blocking, DNS-over-HTTPS, .numa local domains with auto TLS, and LAN service discovery. +**Update (March 2026):** Recursive resolution and DNSSEC validation are now shipped. Numa resolves from root nameservers with full chain-of-trust verification (RSA/SHA-256, ECDSA P-256, Ed25519) and NSEC/NSEC3 authenticated denial of existence. -On the roadmap: +**[Read the follow-up: Implementing DNSSEC from Scratch in Rust →](/blog/posts/dnssec-from-scratch.html)** + +Still on the roadmap: - **DoT (DNS-over-TLS)** — DoH was first because it passes through captive portals and corporate firewalls (port 443 vs 853). DoT has less framing overhead, so it's faster. Both will be available. -- **Recursive resolution** — walk the delegation chain from root servers instead of forwarding. Combined with DNSSEC validation, this removes the need to trust any upstream resolver. - **[pkarr](https://github.com/pubky/pkarr) integration** — self-sovereign DNS via the Mainline BitTorrent DHT. Publish DNS records signed with your Ed25519 key, no registrar needed. -But those are rabbit holes for future posts. - [github.com/razvandimescu/numa](https://github.com/razvandimescu/numa) diff --git a/blog/dnssec-from-scratch.md b/blog/dnssec-from-scratch.md new file mode 100644 index 0000000..fc9db7c --- /dev/null +++ b/blog/dnssec-from-scratch.md @@ -0,0 +1,201 @@ +--- +title: Implementing DNSSEC from Scratch in Rust +description: Recursive resolution from root hints, chain-of-trust validation, NSEC/NSEC3 denial proofs, and what I learned implementing DNSSEC with zero DNS libraries. +date: March 2026 +--- + +In the [previous post](/blog/posts/dns-from-scratch.html) I covered how DNS works at the wire level — packet format, label compression, TTL caching, DoH. Numa was a forwarding resolver: it parsed packets, did useful things locally, and relayed the rest to Cloudflare or Quad9. + +That post ended with "recursive resolution and DNSSEC are on the roadmap." This post is about building both. + +The short version: Numa now resolves from root nameservers with iterative queries, validates the full DNSSEC chain of trust, and cryptographically proves that non-existent domains don't exist. No upstream dependency. No DNS libraries. Just `ring` for the crypto primitives and a lot of RFC reading. + +## Why recursive? + +A forwarding resolver trusts its upstream. When you ask Quad9 for `cloudflare.com`, you trust that Quad9 returns the real answer. If Quad9 lies, gets compromised, or is legally compelled to redirect you — you have no way to know. + +A recursive resolver doesn't trust anyone. It starts at the root nameservers (operated by 12 independent organizations) and follows the delegation chain: root → `.com` TLD → `cloudflare.com` authoritative servers. Each server only answers for its own zone. No single entity sees your full query pattern. + +DNSSEC adds cryptographic proof to each step. The root signs `.com`'s key. `.com` signs `cloudflare.com`'s key. `cloudflare.com` signs its own records. If any step is tampered with, the chain breaks and Numa rejects the response. + +## The iterative resolution loop + +Recursive resolution is a misnomer — the resolver actually uses *iterative* queries. It asks root "where is `cloudflare.com`?", root says "I don't know, but here are the `.com` nameservers." It asks `.com`, which says "here are cloudflare's nameservers." It asks those, and gets the answer. + +``` +resolve("cloudflare.com", A) + → ask 198.41.0.4 (a.root-servers.net) + ← "try .com: ns1.gtld-servers.net (192.5.6.30)" [referral + glue] + → ask 192.5.6.30 (ns1.gtld-servers.net) + ← "try cloudflare: ns1.cloudflare.com (173.245.58.51)" [referral + glue] + → ask 173.245.58.51 (ns1.cloudflare.com) + ← "104.16.132.229" [answer] +``` + +The implementation (`src/recursive.rs`) is a loop with three possible outcomes per query: + +1. **Answer** — the server knows the record. Cache it, return it. +2. **Referral** — the server delegates to another zone. Extract NS records and glue (A/AAAA records for the nameservers, included in the additional section to avoid a chicken-and-egg problem), then query the next server. +3. **NXDOMAIN/REFUSED** — the name doesn't exist or the server refuses. Cache the negative result. + +CNAME chasing adds complexity: if you ask for `www.cloudflare.com` and get a CNAME to `cloudflare.com`, you need to restart resolution for the new name. I cap this at 8 levels. + +### TLD priming + +Cold-cache resolution is slow. Every query needs root → TLD → authoritative, each with its own network round-trip. For the first query to `example.com`, that's three serial UDP round-trips before you get an answer. + +TLD priming solves this. On startup, Numa queries root for NS records of 34 common TLDs (`.com`, `.org`, `.net`, `.io`, `.dev`, plus EU ccTLDs), caching NS records, glue addresses, DS records, and DNSKEY records. After priming, the first query to any `.com` domain skips root entirely — it already knows where `.com`'s nameservers are, and already has the DNSSEC keys needed to validate the response. + +## DNSSEC chain of trust + +DNSSEC doesn't encrypt DNS traffic. It *signs* it. Every DNS record can have an accompanying RRSIG (signature) record. The resolver verifies the signature against the zone's DNSKEY, then verifies that DNSKEY against the parent zone's DS (delegation signer) record, walking up until it reaches the root trust anchor — a hardcoded public key that IANA publishes and the entire internet agrees on. + +``` +cloudflare.com A 104.16.132.229 + signed by → RRSIG (key_tag=34505, algo=13, signer=cloudflare.com) + verified with → DNSKEY (cloudflare.com, key_tag=34505, ECDSA P-256) + vouched for by → DS (at .com, key_tag=2371, digest=SHA-256 of cloudflare's DNSKEY) + signed by → RRSIG (key_tag=19718, signer=com) + verified with → DNSKEY (com, key_tag=19718) + vouched for by → DS (at root, key_tag=30909) + signed by → RRSIG (signer=.) + verified with → DNSKEY (., key_tag=20326) ← root trust anchor (hardcoded) +``` + +### How keys get there + +The domain owner generates the DNSKEY keypair — typically their DNS provider (Cloudflare, etc.) does this. The owner then submits the DS record (a hash of their DNSKEY) to their registrar (Namecheap, GoDaddy), who passes it to the registry (Verisign for `.com`). The registry signs it into the TLD zone, and IANA signs the TLD's DS into the root. Trust flows up; keys flow down. + +The irony: you "own" your DNSSEC keys, but your registrar controls whether the DS record gets published. If they remove it — by mistake, by policy, or by court order — your DNSSEC chain breaks silently. + +### The trust anchor + +IANA's root KSK (Key Signing Key) has key tag 20326, algorithm 8 (RSA/SHA-256), and a 256-byte public key. It was last rolled in 2018. I hardcode it as a `const` array — this is the one thing in the entire system that requires out-of-band trust. + +```rust +const ROOT_KSK_PUBLIC_KEY: &[u8] = &[ + 0x03, 0x01, 0x00, 0x01, 0xac, 0xff, 0xb4, 0x09, + // ... 256 bytes total +]; +``` + +When IANA rolls this key (rare — the previous key lasted from 2010 to 2018), every DNSSEC validator on the internet needs updating. For Numa, that means a binary update. Something to watch. Every DNSKEY also has a key tag — a 16-bit checksum over its RDATA. The first test I wrote: compute the root KSK's key tag and assert it equals 20326. Instant confidence that the encoding is correct. + +## The crypto + +Numa uses `ring` for all cryptographic operations. Three algorithms cover the vast majority of signed zones: + +| Algorithm | ID | Usage | Verify time | +|---|---|---|---| +| RSA/SHA-256 | 8 | Root, most TLDs | 10.9 µs | +| ECDSA P-256 | 13 | Cloudflare, many modern zones | 174 ns | +| Ed25519 | 15 | Newer zones | ~200 ns | + +### RSA key format conversion + +DNS stores RSA public keys in RFC 3110 format (exponent length, exponent, modulus). `ring` expects PKCS#1 DER (ASN.1 encoded). Converting between them means writing a minimal ASN.1 encoder with leading-zero stripping and sign-bit padding. Getting this wrong produces keys that `ring` silently rejects — one of the harder bugs to track down. + +### ECDSA is simpler + +ECDSA P-256 keys in DNS are 64 bytes (x + y coordinates). `ring` expects uncompressed point format: `0x04` prefix + 64 bytes. One line: + +```rust +let mut uncompressed = Vec::with_capacity(65); +uncompressed.push(0x04); +uncompressed.extend_from_slice(public_key); // 64 bytes from DNS +``` + +Signatures are also 64 bytes (r + s), used directly. No format conversion needed. + +### Building the signed data + +RRSIG verification doesn't sign the DNS packet — it signs a canonical form of the records. Building this correctly is the most detail-sensitive part of DNSSEC. The signed data is: + +1. RRSIG RDATA fields (type covered, algorithm, labels, original TTL, expiration, inception, key tag, signer name) — *without* the signature itself +2. For each record in the RRset: owner name (lowercased, uncompressed) + type + class + original TTL (from the RRSIG, not the record's current TTL) + RDATA length + canonical RDATA + +The records must be sorted by their canonical wire-format representation. Owner names must be lowercased. The TTL must be the *original* TTL from the RRSIG, not the decremented TTL from caching. + +Getting any of these details wrong — wrong TTL, wrong case, wrong sort order, wrong RDATA encoding — produces a valid-looking but incorrect signed data blob, and `ring` returns a signature mismatch with no diagnostic information. I spent more time debugging signed data construction than any other part of DNSSEC. + +## Proving a name doesn't exist + +Verifying that `cloudflare.com` has a valid A record is one thing. Proving that `doesnotexist.cloudflare.com` *doesn't* exist — cryptographically, in a way that can't be forged — is harder. + +### NSEC + +NSEC records form a chain. Each NSEC says "the next name in this zone after me is X, and at my name these record types exist." If you query `beta.example.com` and the zone has `alpha.example.com → NSEC → gamma.example.com`, the gap proves `beta` doesn't exist — there's nothing between `alpha` and `gamma`. + +For NXDOMAIN proofs, RFC 4035 §5.4 requires two things: +1. An NSEC record whose gap covers the queried name +2. An NSEC record proving no wildcard exists at the closest encloser + +The canonical DNS name ordering (RFC 4034 §6.1) compares labels right-to-left, case-insensitive. `a.example.com` < `b.example.com` because at the `example.com` level they're equal, then `a` < `b`. But `z.example.com` < `a.example.org` because `.com` < `.org` at the TLD level. + +### NSEC3 + +NSEC3 solves NSEC's zone enumeration problem — with NSEC, you can walk the chain and discover every name in the zone. NSEC3 hashes the names first (iterated SHA-1 with a salt), so the NSEC3 chain reveals hashes, not names. + +The proof is a 3-part closest encloser proof (RFC 5155 §8.4): find an ancestor whose hash matches an NSEC3 owner, prove the next-closer name falls within a hash range gap, and prove the wildcard at the closest encloser also falls within a gap. All three must hold, or the denial is rejected. + +I cap NSEC3 iterations at 500 (RFC 9276 recommends 0). Higher iteration counts are a DoS vector — each verification requires `iterations + 1` SHA-1 hashes. + +## Making it fast + +Cold-cache DNSSEC validation initially required ~5 network fetches per query (DNSKEY for each zone in the chain, plus DS records). Three optimizations brought this down to ~1: + +**TLD priming** (startup) — fetch root DNSKEY + each TLD's NS/DS/DNSKEY. After priming, the trust chain from root to any `.com` zone is fully cached. + +**Referral DS piggybacking** — when a TLD server refers you to `cloudflare.com`'s nameservers, the authority section often includes DS records for the child zone. Cache them during resolution instead of fetching separately during validation. + +**DNSKEY prefetch** — before the validation loop, scan all RRSIGs for signer zones and batch-fetch any missing DNSKEYs. This avoids serial DNSKEY fetches inside the per-RRset verification loop. + +Result: a cold-cache query for `cloudflare.com` with full DNSSEC validation takes ~90ms. The TLD chain is already warm; only one DNSKEY fetch is needed (for `cloudflare.com` itself). + +| Operation | Time | +|---|---| +| ECDSA P-256 verify | 174 ns | +| Ed25519 verify | ~200 ns | +| RSA/SHA-256 verify | 10.9 µs | +| DS digest (SHA-256) | 257 ns | +| Key tag computation | 20–63 ns | +| Cold-cache validation (1 fetch) | ~90 ms | + +The network fetch dominates. The crypto is noise. + +## Surviving hostile networks + +I deployed Numa as my system DNS and switched to a different network. Everything broke. Every query: SERVFAIL, 3-second timeout. + +The network probe told the story: the ISP blocks outbound UDP port 53 to all servers except a handful of whitelisted public resolvers (Google, Cloudflare). Root servers, TLD servers, authoritative servers — all unreachable over UDP. The ISP forces you onto their DNS or a blessed upstream. Recursive resolution is impossible. + +Except TCP port 53 worked fine. And every DNS server is required to support TCP (RFC 1035 section 4.2.2). The ISP apparently only filters UDP. + +The fix has three parts: + +**TCP fallback.** Every outbound query tries UDP first (800ms timeout). If UDP fails or the response is truncated, retry immediately over TCP. TCP uses a 2-byte length prefix before the DNS message — trivial to implement, and it handles DNSSEC responses that exceed the UDP payload limit. + +**UDP auto-disable.** After 3 consecutive UDP failures, flip a global `AtomicBool` and skip UDP entirely — go TCP-first for all queries. This avoids burning 800ms per hop on a network where UDP will never work. The flag resets when the network changes (detected via LAN IP monitoring). + +**Query minimization (RFC 7816).** When querying root servers, send only the TLD — `com` instead of `secret-project.example.com`. Root servers handle trillions of queries and are operated by 12 organizations. Minimization reduces what they learn from yours. + +The result: on a network that blocks UDP:53, Numa detects the block within the first 3 queries, switches to TCP, and resolves normally at 300-500ms per cold query. Cached queries remain 0ms. No manual config change needed — switch networks and it adapts. + +I wouldn't have found this without dogfooding. The code worked perfectly on my home network. It took a real hostile network to expose the assumption that UDP always works. + +## What I learned + +**DNSSEC is a verification system, not an encryption system.** It proves authenticity — this record was signed by the zone owner. It doesn't hide what you're querying. For privacy, you still need encrypted transport (DoH/DoT) or recursive resolution (no single upstream). + +**The hardest bugs are in data serialization, not crypto.** `ring` either verifies or it doesn't — a binary answer. But getting the signed data blob exactly right (correct TTL, correct case, correct sort, correct RDATA encoding for each record type) requires extreme precision. A single wrong byte means verification fails with no hint about what's wrong. + +**Negative proofs are harder than positive proofs.** Verifying a record exists: verify one RRSIG. Proving a record doesn't exist: find the right NSEC/NSEC3 records, verify their RRSIGs, check gap coverage, check wildcard denial, compute hashes. The NSEC3 closest encloser proof alone has three sub-proofs, each requiring hash computation and range checking. + +**Performance optimization is about avoiding network, not avoiding CPU.** The crypto takes nanoseconds to microseconds. The network fetch takes tens of milliseconds. Every optimization that matters — TLD priming, DS piggybacking, DNSKEY prefetch — is about eliminating a round trip, not speeding up a hash. + +## What's next + +- **[pkarr](https://github.com/pubky/pkarr) integration** — self-sovereign DNS via the Mainline BitTorrent DHT. Your Ed25519 key is your domain. No registrar, no ICANN. +- **DoT (DNS-over-TLS)** — the last encrypted transport we don't support + +The code is at [github.com/razvandimescu/numa](https://github.com/razvandimescu/numa) — the DNSSEC validation is in [`src/dnssec.rs`](https://github.com/razvandimescu/numa/blob/main/src/dnssec.rs) and the recursive resolver in [`src/recursive.rs`](https://github.com/razvandimescu/numa/blob/main/src/recursive.rs). MIT license. diff --git a/numa.toml b/numa.toml index 09e8523..6a523ac 100644 --- a/numa.toml +++ b/numa.toml @@ -4,12 +4,39 @@ api_port = 5380 # api_bind_addr = "127.0.0.1" # default; set to "0.0.0.0" for LAN dashboard access # [upstream] -# address = "" # auto-detect from system resolver (default) +# mode = "forward" # "forward" (default) — relay to upstream +# # "recursive" — resolve from root hints (no address needed) # address = "https://dns.quad9.net/dns-query" # DNS-over-HTTPS (encrypted) # address = "https://cloudflare-dns.com/dns-query" # Cloudflare DoH # address = "9.9.9.9" # plain UDP -# port = 53 # only used for plain UDP +# port = 53 # only for forward mode, plain UDP # timeout_ms = 3000 +# root_hints = [ # only used in recursive mode +# "198.41.0.4", # a.root-servers.net (Verisign) +# "199.9.14.201", # b.root-servers.net (USC-ISI) +# "192.33.4.12", # c.root-servers.net (Cogent) +# "199.7.91.13", # d.root-servers.net (UMD) +# "192.203.230.10", # e.root-servers.net (NASA) +# "192.5.5.241", # f.root-servers.net (ISC) +# "192.112.36.4", # g.root-servers.net (US DoD) +# "198.97.190.53", # h.root-servers.net (US Army) +# "192.36.148.17", # i.root-servers.net (Netnod) +# "192.58.128.30", # j.root-servers.net (Verisign) +# "193.0.14.129", # k.root-servers.net (RIPE NCC) +# "199.7.83.42", # l.root-servers.net (ICANN) +# "202.12.27.33", # m.root-servers.net (WIDE) +# ] +# prime_tlds = [ # TLDs to pre-warm on startup (recursive mode) +# "com", "net", "org", "info", # gTLDs +# "io", "dev", "app", "xyz", "me", +# "eu", "uk", "de", "fr", "nl", # EU + European ccTLDs +# "it", "es", "pl", "se", "no", +# "dk", "fi", "at", "be", "ie", +# "pt", "cz", "ro", "gr", "hu", +# "bg", "hr", "sk", "si", "lt", +# "lv", "ee", "ch", "is", +# "co", "br", "au", "ca", "jp", # other major ccTLDs +# ] # [blocking] # enabled = true # set to false to disable ad blocking @@ -51,6 +78,11 @@ tld = "numa" # value = "127.0.0.1" # ttl = 60 +# DNSSEC signature validation (requires mode = "recursive") +# [dnssec] +# enabled = false # opt-in: verify chain of trust from root KSK +# strict = false # true = SERVFAIL on bogus signatures + # LAN service discovery via mDNS (disabled by default — no network traffic unless enabled) # [lan] # enabled = true # discover other Numa instances via mDNS (_numa._tcp.local) diff --git a/site/blog/dns-from-scratch.html b/site/blog/dns-from-scratch.html deleted file mode 100644 index 39f857f..0000000 --- a/site/blog/dns-from-scratch.html +++ /dev/null @@ -1,651 +0,0 @@ - - - - - -I Built a DNS Resolver from Scratch in Rust — Numa - - - - - - - - -
-
-

I Built a DNS Resolver from Scratch in Rust

- -
- -

I wanted to understand how DNS actually works. Not the “it translates -domain names to IP addresses” explanation — the actual bytes on the -wire. What does a DNS packet look like? How does label compression work? -Why is everything crammed into 512 bytes?

-

So I built one from scratch in Rust. No hickory-dns, no -trust-dns, no simple-dns. The entire RFC 1035 -wire protocol — headers, labels, compression pointers, record types — -parsed and serialized by hand. It started as a weekend learning project, -became a side project I kept coming back to over 6 years, and eventually -turned into Numa — -which I now use as my actual system DNS.

-

A note on terminology before we go further: Numa is currently a -forwarding resolver — it parses and caches DNS packets, but -forwards queries to an upstream (Quad9, Cloudflare, or any DoH provider) -rather than walking the delegation chain from root servers itself. Think -of it as a smart proxy that does useful things with your DNS traffic -locally (caching, ad blocking, overrides, local service domains) before -forwarding what it can’t answer. Full recursive resolution — where Numa -talks directly to root and authoritative nameservers — is on the -roadmap, along with DNSSEC validation.

-

Here’s what surprised me along the way.

-

What does a DNS -packet actually look like?

-

You can see a real one yourself. Run this:

-
dig @127.0.0.1 example.com A +noedns
-
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15242
-;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
-
-;; QUESTION SECTION:
-;example.com.                   IN      A
-
-;; ANSWER SECTION:
-example.com.            53      IN      A       104.18.27.120
-example.com.            53      IN      A       104.18.26.120
-

That’s the human-readable version. But what’s actually on the wire? A -DNS query for example.com A is just 29 bytes:

-
         ID    Flags  QCount ACount NSCount ARCount
-        ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐
-Header: AB CD  01 00  00 01  00 00  00 00  00 00
-        └────┘ └────┘ └────┘ └────┘ └────┘ └────┘
-         ↑      ↑      ↑
-         │      │      └─ 1 question, 0 answers, 0 authority, 0 additional
-         │      └─ Standard query, recursion desired
-         └─ Random ID (we'll match this in the response)
-
-Question: 07 65 78 61 6D 70 6C 65  03 63 6F 6D  00  00 01  00 01
-          ── ─────────────────────  ── ─────────  ──  ─────  ─────
-          7  e  x  a  m  p  l  e   3  c  o  m   end  A      IN
-          ↑                        ↑             ↑
-          └─ length prefix         └─ length     └─ root label (end of name)
-

12 bytes of header + 17 bytes of question = 29 bytes to ask “what’s -the IP for example.com?” Compare that to an HTTP request for the same -information — you’d need hundreds of bytes just for headers.

-

We can send exactly those bytes and capture what comes back:

-
python3 -c "
-import socket
-# Hand-craft a DNS query: header (12 bytes) + question (17 bytes)
-q  = b'\xab\xcd\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00'  # header
-q += b'\x07example\x03com\x00\x00\x01\x00\x01'              # question
-s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
-s.sendto(q, ('127.0.0.1', 53))
-resp = s.recv(512)
-for i in range(0, len(resp), 16):
-    h = ' '.join(f'{b:02x}' for b in resp[i:i+16])
-    a = ''.join(chr(b) if 32<=b<127 else '.' for b in resp[i:i+16])
-    print(f'{i:08x}  {h:<48s}  {a}')
-"
-
00000000  ab cd 81 80 00 01 00 02 00 00 00 00 07 65 78 61   .............exa
-00000010  6d 70 6c 65 03 63 6f 6d 00 00 01 00 01 07 65 78   mple.com......ex
-00000020  61 6d 70 6c 65 03 63 6f 6d 00 00 01 00 01 00 00   ample.com.......
-00000030  00 19 00 04 68 12 1b 78 07 65 78 61 6d 70 6c 65   ....h..x.example
-00000040  03 63 6f 6d 00 00 01 00 01 00 00 00 19 00 04 68   .com...........h
-00000050  12 1a 78                                          ..x
-

83 bytes back. Let’s annotate the response:

-
         ID    Flags  QCount ACount NSCount ARCount
-        ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐
-Header: AB CD  81 80  00 01  00 02  00 00  00 00
-        └────┘ └────┘ └────┘ └────┘ └────┘ └────┘
-         ↑      ↑      ↑      ↑
-         │      │      │      └─ 2 answers
-         │      │      └─ 1 question (echoed back)
-         │      └─ Response flag set, recursion available
-         └─ Same ID as our query
-
-Question: 07 65 78 61 6D 70 6C 65  03 63 6F 6D  00  00 01  00 01
-          (same as our query — echoed back)
-
-Answer 1: 07 65 78 61 6D 70 6C 65  03 63 6F 6D  00  00 01  00 01
-          ─────────────────────────────────────  ──  ─────  ─────
-          e  x  a  m  p  l  e  .  c  o  m       end  A      IN
-
-          00 00 00 19  00 04  68 12 1B 78
-          ───────────  ─────  ───────────
-          TTL: 25s     len:4  104.18.27.120
-
-Answer 2: (same domain repeated)  00 01  00 01  00 00 00 19  00 04  68 12 1A 78
-                                                                    ───────────
-                                                                    104.18.26.120
-

Notice something wasteful? The domain example.com -appears three times — once in the question, twice in the -answers. That’s 39 bytes of repeated names in an 83-byte packet. DNS has -a solution for this — but first, the overall structure.

-

The whole thing fits in a single UDP datagram. The structure is:

-
+--+--+--+--+--+--+--+--+
-|         Header         |  12 bytes: ID, flags, counts
-+--+--+--+--+--+--+--+--+
-|        Questions       |  What you're asking
-+--+--+--+--+--+--+--+--+
-|         Answers        |  The response records
-+--+--+--+--+--+--+--+--+
-|       Authorities      |  NS records for the zone
-+--+--+--+--+--+--+--+--+
-|       Additional       |  Extra helpful records
-+--+--+--+--+--+--+--+--+
-

In Rust, parsing the header is just reading 12 bytes and unpacking -the flags:

-
pub fn read(buffer: &mut BytePacketBuffer) -> Result<DnsHeader> {
-    let id = buffer.read_u16()?;
-    let flags = buffer.read_u16()?;
-    // Flags pack 9 fields into 16 bits
-    let recursion_desired = (flags & (1 << 8)) > 0;
-    let truncated_message = (flags & (1 << 9)) > 0;
-    let authoritative_answer = (flags & (1 << 10)) > 0;
-    let opcode = (flags >> 11) & 0x0F;
-    let response = (flags & (1 << 15)) > 0;
-    // ... and so on
-}
-

No padding, no alignment, no JSON overhead. DNS was designed in 1987 -when every byte counted, and honestly? The wire format is kind of -beautiful in its efficiency.

-

Label compression is the -clever part

-

Remember how example.com appeared three times in that -83-byte response? Domain names in DNS are stored as a sequence of -labels — length-prefixed segments:

-
example.com → [7]example[3]com[0]
-

The [7] means “the next 7 bytes are a label.” The -[0] is the root label (end of name). That’s 13 bytes per -occurrence, 39 bytes for three repetitions. In a response with authority -and additional records, domain names can account for half the -packet.

-

DNS solves this with compression pointers — if the -top two bits of a length byte are 11, the remaining 14 bits -are an offset back into the packet where the rest of the name can be -found. A well-compressed version of our response would replace the -answer names with C0 0C — a 2-byte pointer to offset 12 -where example.com first appears in the question section. -That turns 39 bytes of names into 15 (13 + 2 + 2). Our upstream didn’t -bother compressing, but many do — especially when related domains -appear:

-
Offset 0x20: [6]google[3]com[0]        ← full name
-Offset 0x40: [4]mail[0xC0][0x20]       ← "mail" + pointer to offset 0x20
-Offset 0x50: [3]www[0xC0][0x20]        ← "www" + pointer to offset 0x20
-

Pointers can chain — a pointer can point to another pointer. Parsing -this correctly requires tracking your position in the buffer and -handling jumps:

-
pub fn read_qname(&mut self, outstr: &mut String) -> Result<()> {
-    let mut pos = self.pos();
-    let mut jumped = false;
-    let mut delim = "";
-
-    loop {
-        let len = self.get(pos)?;
-
-        // Top two bits set = compression pointer
-        if (len & 0xC0) == 0xC0 {
-            if !jumped {
-                self.seek(pos + 2)?; // advance past the pointer
-            }
-            let offset = (((len as u16) ^ 0xC0) << 8) | self.get(pos + 1)? as u16;
-            pos = offset as usize;
-            jumped = true;
-            continue;
-        }
-
-        pos += 1;
-        if len == 0 { break; } // root label
-
-        outstr.push_str(delim);
-        outstr.push_str(&self.get_range(pos, len as usize)?
-            .iter().map(|&b| b as char).collect::<String>());
-        delim = ".";
-        pos += len as usize;
-    }
-
-    if !jumped {
-        self.seek(pos)?;
-    }
-    Ok(())
-}
-

This one bit me: when you follow a pointer, you must not -advance the buffer’s read position past where you jumped from. The -pointer is 2 bytes, so you advance by 2, but the actual label data lives -elsewhere in the packet. If you follow the pointer and also advance past -it, you’ll skip over the next record entirely. I spent a fun evening -debugging that one.

-

TTL adjustment on read, not -write

-

This is my favorite trick in the whole codebase. I initially stored -the remaining TTL and decremented it, which meant I needed a background -thread to sweep expired entries. It worked, but it felt wrong — too much -machinery for something simple.

-

The cleaner approach: store the original TTL and the timestamp when -the record was cached. On read, compute -remaining = original_ttl - elapsed. If it’s zero or -negative, the entry is stale — evict it lazily.

-
pub fn lookup(&mut self, domain: &str, qtype: QueryType) -> Option<DnsPacket> {
-    let key = (domain.to_lowercase(), qtype);
-    let entry = self.entries.get(&key)?;
-    let elapsed = entry.cached_at.elapsed().as_secs() as u32;
-
-    if elapsed >= entry.original_ttl {
-        self.entries.remove(&key);
-        return None;
-    }
-
-    // Adjust TTLs in the response to reflect remaining time
-    let mut packet = entry.packet.clone();
-    for answer in &mut packet.answers {
-        answer.set_ttl(entry.original_ttl.saturating_sub(elapsed));
-    }
-    Some(packet)
-}
-

No background thread. No timer. Entries expire lazily. The cache -stays consistent because every consumer sees the adjusted TTL.

-

The resolution pipeline

-

Each incoming UDP packet spawns a tokio task. Each task walks a -deterministic pipeline — every step either answers or passes to the -next:

-
                     ┌─────────────────────────────────────────────────────┐
-                     │              Numa Resolution Pipeline               │
-                     └─────────────────────────────────────────────────────┘
-
-  Query ──→ Overrides ──→ .numa TLD ──→ Blocklist ──→ Zones ──→ Cache ──→ DoH
-    │        │              │             │             │         │         │
-    │        │ match?       │ match?      │ blocked?    │ match?  │ hit?    │
-    │        ↓              ↓             ↓             ↓         ↓         ↓
-    │      respond        respond       0.0.0.0      respond   respond   forward
-    │      (auto-reverts  (reverse      (ad gone)    (static   (TTL      to upstream
-    │       after N min)   proxy+TLS)                 records)  adjusted) (encrypted)
-    │
-    └──→ Each step either answers or passes to the next.
-

This is where “from scratch” pays off. Want conditional forwarding -for Tailscale? Insert a step before the upstream. Want to override -api.example.com for 5 minutes while debugging? Add an entry -in the overrides step — it auto-expires. A DNS library would have hidden -this pipeline behind an opaque resolve() call.

-

DNS-over-HTTPS: the -“wait, that’s it?” moment

-

The most recent addition, and honestly the one that surprised me with -how little code it needed. DoH (RFC 8484) is conceptually simple: take -the exact same DNS wire-format packet you’d send over UDP, POST it to an -HTTPS endpoint with Content-Type: application/dns-message, -and parse the response the same way. Same bytes, different -transport.

-
async fn forward_doh(
-    query: &DnsPacket,
-    url: &str,
-    client: &reqwest::Client,
-    timeout_duration: Duration,
-) -> Result<DnsPacket> {
-    let mut send_buffer = BytePacketBuffer::new();
-    query.write(&mut send_buffer)?;
-
-    let resp = timeout(timeout_duration, client
-        .post(url)
-        .header("content-type", "application/dns-message")
-        .header("accept", "application/dns-message")
-        .body(send_buffer.filled().to_vec())
-        .send())
-    .await??.error_for_status()?;
-
-    let bytes = resp.bytes().await?;
-    let mut recv_buffer = BytePacketBuffer::from_bytes(&bytes);
-    DnsPacket::from_buffer(&mut recv_buffer)
-}
-

The one gotcha that cost me an hour: Quad9 and other DoH providers -require HTTP/2. My first attempt used HTTP/1.1 and got a cryptic 400 Bad -Request. Adding the http2 feature to reqwest fixed it. The -upside of HTTP/2? Connection multiplexing means subsequent queries reuse -the TLS session — ~16ms vs ~50ms for the first query. Free -performance.

-

The Upstream enum dispatches between UDP and DoH based -on the URL scheme:

-
pub enum Upstream {
-    Udp(SocketAddr),
-    Doh { url: String, client: reqwest::Client },
-}
-

If the configured address starts with https://, it’s -DoH. Otherwise, plain UDP. Simple, no toggles.

-

“Why not just use dnsmasq -+ nginx + mkcert?”

-

You absolutely can — those are mature, battle-tested tools. The -difference is integration: with dnsmasq + nginx + mkcert, you’re -configuring three tools with three config formats. Numa puts the DNS -record, reverse proxy, and TLS cert behind one API call:

-
curl -X POST localhost:5380/services -d '{"name":"frontend","target_port":5173}'
-

That creates the DNS entry, generates a TLS certificate, and starts -proxying — including WebSocket upgrade for Vite HMR. One command, no -config files. Having full control over the resolution pipeline is what -makes auto-revert overrides and LAN discovery possible.

-

What I learned

-

DNS is a 40-year-old protocol that works remarkably -well. The wire format is tight, the caching model is elegant, -and the hierarchical delegation system has scaled to billions of queries -per day. The things people complain about (DNSSEC complexity, lack of -encryption) are extensions bolted on decades later, not flaws in the -original design.

-

The hard parts aren’t where you’d expect. Parsing -the wire protocol was straightforward (RFC 1035 is well-written). The -hard parts were: browsers rejecting wildcard certs under single-label -TLDs, macOS resolver quirks (scutil vs -/etc/resolv.conf), and getting multiple processes to bind -the same multicast port (SO_REUSEPORT on macOS, -SO_REUSEADDR on Linux).

-

Learn the vocabulary before you show up. I initially -called Numa a “DNS resolver” and got corrected — it’s a forwarding -resolver. The distinction matters to people who work with DNS -professionally, and being sloppy about it cost me credibility in my -first community posts.

-

What’s next

-

Numa is at v0.5.0 with DNS forwarding, caching, ad blocking, -DNS-over-HTTPS, .numa local domains with auto TLS, and LAN service -discovery.

-

On the roadmap:

-
    -
  • DoT (DNS-over-TLS) — DoH was first because it -passes through captive portals and corporate firewalls (port 443 vs -853). DoT has less framing overhead, so it’s faster. Both will be -available.
  • -
  • Recursive resolution — walk the delegation chain -from root servers instead of forwarding. Combined with DNSSEC -validation, this removes the need to trust any upstream resolver.
  • -
  • pkarr -integration — self-sovereign DNS via the Mainline BitTorrent -DHT. Publish DNS records signed with your Ed25519 key, no registrar -needed.
  • -
-

But those are rabbit holes for future posts.

-

github.com/razvandimescu/numa

-
- - - - - diff --git a/site/blog/index.html b/site/blog/index.html index 4034054..f19149c 100644 --- a/site/blog/index.html +++ b/site/blog/index.html @@ -168,7 +168,14 @@ body::before {

Blog