From 289f2b973b146bad2f30b4a7846c6ae91a7cb363 Mon Sep 17 00:00:00 2001 From: Razvan Dimescu Date: Sat, 11 Apr 2026 14:10:13 +0300 Subject: [PATCH] chore: remove built blog HTML from tracking (built by CI) Co-Authored-By: Claude Opus 4.6 (1M context) --- site/blog/posts/dot-from-scratch.html | 553 -------------------------- 1 file changed, 553 deletions(-) delete mode 100644 site/blog/posts/dot-from-scratch.html diff --git a/site/blog/posts/dot-from-scratch.html b/site/blog/posts/dot-from-scratch.html deleted file mode 100644 index a620f3b..0000000 --- a/site/blog/posts/dot-from-scratch.html +++ /dev/null @@ -1,553 +0,0 @@ - - - - - -DNS-over-TLS from Scratch in Rust — Numa - - - - - - - - -
-
-

DNS-over-TLS from Scratch in Rust

- -
- -

The previous post -ended with “DoT — the last encrypted transport we don’t support.” This -post is about building it.

-

Numa now runs a DoT listener on port 853. My iPhone uses it as its -system resolver, so ad blocking, DNSSEC validation, and recursive -resolution follow my phone through the day. No cloud, no account, no -companion app — a self-signed cert, a .mobileconfig -profile, and a QR code in the terminal.

-

RFC 7858 is ten pages. The hard parts weren’t in the RFC. They were -in cross-protocol confusion defenses, a crypto-provider init gotcha that -only triggered in one specific config combination, and a certificate SAN -bug iOS was happy to accept and kdig immediately rejected. -This post is about those parts.

-

Why DoT when you already have -DoH?

-

Numa has shipped DoH since v0.1. Both protocols tunnel DNS over TLS; -DoH wraps queries in HTTP/2, DoT is DNS-over-TCP with TLS in front. Same -privacy guarantees, different wrapper.

-

The answer to “why both” is that phones ask for DoT by -name. iOS system DNS configures it with two fields (IP + server -name) instead of a URL template. Android 9+ “Private DNS” speaks DoT -natively. Linux stubs default to DoT. I wanted my phone on Numa without -installing anything on the phone itself, and DoT is the protocol iOS and -Android already speak for that.

-

The wire format is -refreshingly small

-

RFC 7858 is one sentence of wire protocol: DNS-over-TCP (RFC 1035 -§4.2.2) with TLS in front, on port 853. DNS-over-TCP has existed -since 1987 — a 2-byte length prefix followed by the DNS message. DoT is -that, wrapped in a TLS session. The entire framing code is seven -lines:

-
async fn write_framed<S>(stream: &mut S, msg: &[u8]) -> io::Result<()>
-where S: AsyncWriteExt + Unpin {
-    let mut out = Vec::with_capacity(2 + msg.len());
-    out.extend_from_slice(&(msg.len() as u16).to_be_bytes());
-    out.extend_from_slice(msg);
-    stream.write_all(&out).await?;
-    stream.flush().await
-}
-

Reads are symmetric: read_exact two bytes, convert to -u16, read_exact that many bytes. No HTTP -headers, no chunked encoding, no framing layer.

-

Persistent connections

-

A fresh TCP+TLS handshake is at least 3 RTTs — about 300ms on a 100ms -connection, 60× the cost of a UDP query. RFC 7858 §3.4 says clients -SHOULD reuse the TCP connection for multiple queries, and every real DoT -client does: iOS, Android, systemd, stubby. A single connection often -carries hundreds of queries.

-

Timing diagram comparing a DNS lookup over plain UDP (1 RTT), over DoT on a fresh connection (3 RTTs — TCP handshake, TLS 1.3 handshake, then the query), and over a reused DoT session (1 RTT, same as UDP).

-

The amortization point is the whole game. If you only ever do one -query per connection, DoT is roughly 3× slower than UDP and you should -not use it. If you reuse the same TLS session for a browsing session’s -worth of queries, the handshake is paid once and every subsequent query -is effectively free.

-

The server is a loop that reads a length-prefixed message, resolves -it, writes the response framed the same way, waits for the next one. -Three timeouts keep it honest:

-
    -
  • Handshake timeout (10s) — a slowloris that opens -TCP but never sends a ClientHello can’t pin a worker.
  • -
  • Idle timeout (30s) — a connected client with -nothing to say gets dropped.
  • -
  • Write timeout (10s) — a stalled reader can’t hold a -response buffer indefinitely.
  • -
-

A semaphore caps concurrent connections at 512 so a burst of -handshakes can’t exhaust the tokio runtime.

-

ALPN, the -cross-protocol defense that matters

-

If DoT lives on port 853 and HTTPS on 443, what stops an HTTP/2 -client from hitting 853 and getting confused replies? Cross-protocol attacks exist and -have had real CVEs. The defense is ALPN: during the TLS handshake the -client advertises protocols, the server picks one it supports or fails. -A DoT server advertises "dot"; a client offering only -"h2" gets a no_application_protocol fatal -alert before any frames are exchanged.

-

rustls enforces this by default when you set -alpn_protocols:

-
let mut config = ServerConfig::builder()
-    .with_no_client_auth()
-    .with_single_cert(certs, key)?;
-config.alpn_protocols = vec![b"dot".to_vec()];
-

“The library enforces it by default” has a latent risk: a future -rustls upgrade could change the default, and the defense would quietly -evaporate. I wrote a test that pins the behavior so any regression in a -dependency update fails loudly:

-
#[tokio::test]
-async fn dot_rejects_non_dot_alpn() {
-    let (addr, cert_der) = spawn_dot_server().await;
-    let client_config = dot_client(&cert_der, vec![b"h2".to_vec()]);
-    let connector = tokio_rustls::TlsConnector::from(client_config);
-    let tcp = tokio::net::TcpStream::connect(addr).await.unwrap();
-    let result = connector
-        .connect(ServerName::try_from("numa.numa").unwrap(), tcp)
-        .await;
-    assert!(result.is_err(),
-        "DoT server must reject ALPN that doesn't include \"dot\"");
-}
-

When you’re leaning on a library’s default for a security-critical -invariant, the test is the contract.

-

Two bugs that hid for days

-

Both were fixed before v0.10 shipped. Both stayed hidden because my -initial tests used permissive clients.

-

The rustls crypto provider -panic

-

rustls 0.23 requires a CryptoProvider installed before -you can build a ServerConfig. Numa’s HTTPS proxy calls -install_default as a side effect when it builds its own -config, so DoT “just worked” for users who enabled both — the proxy had -already initialized the provider before DoT’s first handshake.

-

Then I added support for user-provided DoT certificates. Someone -running DoT with their own Let’s Encrypt cert, with the HTTPS proxy -disabled, would hit:

-
thread 'dot' panicked at rustls-0.23.25/src/crypto/mod.rs:185:14:
-no process-level CryptoProvider available -- call
-CryptoProvider::install_default() before this point
-

The panic happened on the first client connection, not at startup. -While writing the integration suite for “DoT with BYO cert, proxy -disabled” — the one combination nobody had ever actually exercised — the -first run panicked. Fix is two lines: call install_default -inside load_tls_config so DoT can stand alone. If a side -effect initializes something and you have a path that skips that side -effect, you have a bug waiting for a specific deployment.

-

The SAN bug iOS was happy -to accept

-

Numa’s self-signed DoT cert is generated on first run from a local CA -alongside the data directory. It needs to match whatever -ServerName the client sends as SNI. For the HTTPS proxy, -that’s the wildcard domain pattern *.numa (matching -frontend.numa, api.numa, etc.). I initially -reused the same SAN list for DoT: a wildcard *.numa and -nothing else.

-

On an iPhone this worked perfectly. Full browsing session, persistent -connections in the log, ad blocking active. I was about to merge when I -ran one last smoke test with kdig (GnuTLS-backed, from Knot DNS):

-
$ kdig @192.168.1.16 -p 853 +tls \
-    +tls-ca=/usr/local/var/numa/ca.pem \
-    +tls-hostname=numa.numa example.com A
-
-;; TLS, handshake failed (Error in the certificate.)
-

Huh.

-

RFC -6125 §6.4.3: a wildcard in a certificate’s DNS-ID matches exactly -one label. *.numa matches frontend.numa, but -not numa.numa, because the wildcard wants at least one -label to substitute and strict clients reject wildcards in the leftmost -label under single-label TLDs as ambiguous.

-

iOS’s TLS stack is lenient and accepts it. GnuTLS, NSS (Firefox), and -most non-Apple validators don’t. The fix is five lines — add an explicit -numa.numa SAN alongside the wildcard. But the lesson is the -one that stuck: I wrote a commit message saying “fix an iOS bug” and had -to rewrite it, because iOS was fine. The real bug was that every -GnuTLS/NSS-based client on the planet would have rejected the cert, and -I only found it by running one more test with a stricter tool.

-
-

Test with the strict client. The permissive client hides your -bugs.

-
-

Getting your phone onto it

-

A DoT server is useless without a way to point a phone at it. iOS -won’t let you type an IP and a server name into Settings directly — you -install a .mobileconfig profile that bundles the CA as a -trust anchor and the DNS settings in a single payload.

-

Numa ships a subcommand that builds one on the fly and serves it over -a QR code in the terminal:

-
$ numa setup-phone
-
-  Numa Phone Setup
-
-  Profile URL: http://192.168.1.10:8765/mobileconfig
-
-  ██████████████████████████████
-  ██                          ██
-  ██   [QR code rendered in   ██
-  ██    your terminal]        ██
-  ██                          ██
-  ██████████████████████████████
-
-  On your iPhone:
-    1. Open Camera, point at the QR code, tap the yellow banner
-    2. Allow the download when Safari asks
-    3. Open Settings — tap "Profile Downloaded" near the top
-       (or: Settings → General → VPN & Device Management → Numa DNS)
-    4. Tap Install (top right), enter passcode, Install again
-    5. Settings → General → About → Certificate Trust Settings
-       Toggle ON "Numa Local CA" — required for DoT to work
-

The same QR is available in the dashboard — click “Phone Setup” in -the header and the popover renders an SVG QR code pointing at the -mobileconfig URL. On mobile viewports it shows a direct download link -instead.

-

Numa dashboard with Phone Setup popover showing QR code and install instructions

-

Step 4 is non-negotiable. Even though the CA is bundled in the same -profile that installs the DNS settings, iOS still requires the user to -explicitly toggle trust in Certificate Trust Settings. It’s a deliberate -iOS policy to prevent profile-based trust injection — annoying, and -correct.

-

I’ve been dogfooding this since v0.10 shipped in early April. The -phone resolves through Numa over DoT whenever I’m home; persistent -connections are visible in the log as a single source port living -through dozens of queries. The one real caveat: if the laptop’s LAN IP -changes, the profile breaks. RFC 9462 DDR -fixes that — Numa can respond to _dns.resolver.arpa IN SVCB -with its current IP and iOS picks it up on each network join. Next piece -of work.

-

What I learned

-

RFC-level small, API-level hard. RFC 7858 is ten -pages. The framing is trivial. But the subtle stuff — ALPN, timeouts, -connection caps, handshake vs idle vs write deadlines, backoff on accept -errors — isn’t in the RFC. Miss any of it and you leak a DoS vector or a -protocol confusion hole.

-

Your test matrix is your security matrix. Both bugs -in this post were hidden by lenient clients. In both cases the strict -client — kdig, or a specific config combination — surfaced the bug -instantly. Pick test tools for strictness, not convenience. The moment -you find yourself thinking “but iOS accepts it,” stop and run kdig.

-

Don’t initialize global state via side effects. -“Module A installs a global, module B silently depends on it, disabling -A breaks B” is a bug pattern that keeps coming back. Fix: have module B -initialize its dependency explicitly, even if it means calling an -idempotent install_default twice. The dependency graph -should be local and obvious.

-

What’s next

-
    -
  • DoH server — shipped in v0.12.0. -POST /dns-query accepts RFC 8484 -wire-format queries, so Firefox/Chrome can point their built-in DoH at -Numa.
  • -
  • DoQ server (RFC 9250) — DNS over QUIC. Android 14+ -supports it natively.
  • -
  • DDR (RFC 9462) — auto-discovery via -_dns.resolver.arpa IN SVCB, so phones pick up a moved Numa -instance without the installed profile going stale.
  • -
-

The code is at github.com/razvandimescu/numa -— the DoT listener is in src/dot.rs -and the phone onboarding flow is in src/setup_phone.rs -and src/mobileconfig.rs. -MIT license.

-
- - - - -