DNS-over-TLS from Scratch in Rust
- -The previous post -ended with “DoT — the last encrypted transport we don’t support.” This -post is about building it.
-Numa now runs a DoT listener on port 853. My iPhone uses it as its
-system resolver, so ad blocking, DNSSEC validation, and recursive
-resolution follow my phone through the day. No cloud, no account, no
-companion app — a self-signed cert, a .mobileconfig
-profile, and a QR code in the terminal.
RFC 7858 is ten pages. The hard parts weren’t in the RFC. They were
-in cross-protocol confusion defenses, a crypto-provider init gotcha that
-only triggered in one specific config combination, and a certificate SAN
-bug iOS was happy to accept and kdig immediately rejected.
-This post is about those parts.
Why DoT when you already have -DoH?
-Numa has shipped DoH since v0.1. Both protocols tunnel DNS over TLS; -DoH wraps queries in HTTP/2, DoT is DNS-over-TCP with TLS in front. Same -privacy guarantees, different wrapper.
-The answer to “why both” is that phones ask for DoT by -name. iOS system DNS configures it with two fields (IP + server -name) instead of a URL template. Android 9+ “Private DNS” speaks DoT -natively. Linux stubs default to DoT. I wanted my phone on Numa without -installing anything on the phone itself, and DoT is the protocol iOS and -Android already speak for that.
-The wire format is -refreshingly small
-RFC 7858 is one sentence of wire protocol: DNS-over-TCP (RFC 1035 -§4.2.2) with TLS in front, on port 853. DNS-over-TCP has existed -since 1987 — a 2-byte length prefix followed by the DNS message. DoT is -that, wrapped in a TLS session. The entire framing code is seven -lines:
-async fn write_framed<S>(stream: &mut S, msg: &[u8]) -> io::Result<()>
-where S: AsyncWriteExt + Unpin {
- let mut out = Vec::with_capacity(2 + msg.len());
- out.extend_from_slice(&(msg.len() as u16).to_be_bytes());
- out.extend_from_slice(msg);
- stream.write_all(&out).await?;
- stream.flush().await
-}Reads are symmetric: read_exact two bytes, convert to
-u16, read_exact that many bytes. No HTTP
-headers, no chunked encoding, no framing layer.
Persistent connections
-A fresh TCP+TLS handshake is at least 3 RTTs — about 300ms on a 100ms -connection, 60× the cost of a UDP query. RFC 7858 §3.4 says clients -SHOULD reuse the TCP connection for multiple queries, and every real DoT -client does: iOS, Android, systemd, stubby. A single connection often -carries hundreds of queries.
-The amortization point is the whole game. If you only ever do one -query per connection, DoT is roughly 3× slower than UDP and you should -not use it. If you reuse the same TLS session for a browsing session’s -worth of queries, the handshake is paid once and every subsequent query -is effectively free.
-The server is a loop that reads a length-prefixed message, resolves -it, writes the response framed the same way, waits for the next one. -Three timeouts keep it honest:
--
-
- Handshake timeout (10s) — a slowloris that opens -TCP but never sends a ClientHello can’t pin a worker. -
- Idle timeout (30s) — a connected client with -nothing to say gets dropped. -
- Write timeout (10s) — a stalled reader can’t hold a -response buffer indefinitely. -
A semaphore caps concurrent connections at 512 so a burst of -handshakes can’t exhaust the tokio runtime.
-ALPN, the -cross-protocol defense that matters
-If DoT lives on port 853 and HTTPS on 443, what stops an HTTP/2
-client from hitting 853 and getting confused replies? Cross-protocol attacks exist and
-have had real CVEs. The defense is ALPN: during the TLS handshake the
-client advertises protocols, the server picks one it supports or fails.
-A DoT server advertises "dot"; a client offering only
-"h2" gets a no_application_protocol fatal
-alert before any frames are exchanged.
rustls enforces this by default when you set
-alpn_protocols:
let mut config = ServerConfig::builder()
- .with_no_client_auth()
- .with_single_cert(certs, key)?;
-config.alpn_protocols = vec![b"dot".to_vec()];“The library enforces it by default” has a latent risk: a future -rustls upgrade could change the default, and the defense would quietly -evaporate. I wrote a test that pins the behavior so any regression in a -dependency update fails loudly:
-#[tokio::test]
-async fn dot_rejects_non_dot_alpn() {
- let (addr, cert_der) = spawn_dot_server().await;
- let client_config = dot_client(&cert_der, vec![b"h2".to_vec()]);
- let connector = tokio_rustls::TlsConnector::from(client_config);
- let tcp = tokio::net::TcpStream::connect(addr).await.unwrap();
- let result = connector
- .connect(ServerName::try_from("numa.numa").unwrap(), tcp)
- .await;
- assert!(result.is_err(),
- "DoT server must reject ALPN that doesn't include \"dot\"");
-}When you’re leaning on a library’s default for a security-critical -invariant, the test is the contract.
-Two bugs that hid for days
-Both were fixed before v0.10 shipped. Both stayed hidden because my -initial tests used permissive clients.
-The rustls crypto provider -panic
-rustls 0.23 requires a CryptoProvider installed before
-you can build a ServerConfig. Numa’s HTTPS proxy calls
-install_default as a side effect when it builds its own
-config, so DoT “just worked” for users who enabled both — the proxy had
-already initialized the provider before DoT’s first handshake.
Then I added support for user-provided DoT certificates. Someone -running DoT with their own Let’s Encrypt cert, with the HTTPS proxy -disabled, would hit:
-thread 'dot' panicked at rustls-0.23.25/src/crypto/mod.rs:185:14:
-no process-level CryptoProvider available -- call
-CryptoProvider::install_default() before this point
-The panic happened on the first client connection, not at startup.
-While writing the integration suite for “DoT with BYO cert, proxy
-disabled” — the one combination nobody had ever actually exercised — the
-first run panicked. Fix is two lines: call install_default
-inside load_tls_config so DoT can stand alone. If a side
-effect initializes something and you have a path that skips that side
-effect, you have a bug waiting for a specific deployment.
The SAN bug iOS was happy -to accept
-Numa’s self-signed DoT cert is generated on first run from a local CA
-alongside the data directory. It needs to match whatever
-ServerName the client sends as SNI. For the HTTPS proxy,
-that’s the wildcard domain pattern *.numa (matching
-frontend.numa, api.numa, etc.). I initially
-reused the same SAN list for DoT: a wildcard *.numa and
-nothing else.
On an iPhone this worked perfectly. Full browsing session, persistent
-connections in the log, ad blocking active. I was about to merge when I
-ran one last smoke test with kdig (GnuTLS-backed, from Knot DNS):
$ kdig @192.168.1.16 -p 853 +tls \
- +tls-ca=/usr/local/var/numa/ca.pem \
- +tls-hostname=numa.numa example.com A
-
-;; TLS, handshake failed (Error in the certificate.)
-Huh.
-RFC
-6125 §6.4.3: a wildcard in a certificate’s DNS-ID matches exactly
-one label. *.numa matches frontend.numa, but
-not numa.numa, because the wildcard wants at least one
-label to substitute and strict clients reject wildcards in the leftmost
-label under single-label TLDs as ambiguous.
iOS’s TLS stack is lenient and accepts it. GnuTLS, NSS (Firefox), and
-most non-Apple validators don’t. The fix is five lines — add an explicit
-numa.numa SAN alongside the wildcard. But the lesson is the
-one that stuck: I wrote a commit message saying “fix an iOS bug” and had
-to rewrite it, because iOS was fine. The real bug was that every
-GnuTLS/NSS-based client on the planet would have rejected the cert, and
-I only found it by running one more test with a stricter tool.
--Test with the strict client. The permissive client hides your -bugs.
-
Getting your phone onto it
-A DoT server is useless without a way to point a phone at it. iOS
-won’t let you type an IP and a server name into Settings directly — you
-install a .mobileconfig profile that bundles the CA as a
-trust anchor and the DNS settings in a single payload.
Numa ships a subcommand that builds one on the fly and serves it over -a QR code in the terminal:
-$ numa setup-phone
-
- Numa Phone Setup
-
- Profile URL: http://192.168.1.10:8765/mobileconfig
-
- ██████████████████████████████
- ██ ██
- ██ [QR code rendered in ██
- ██ your terminal] ██
- ██ ██
- ██████████████████████████████
-
- On your iPhone:
- 1. Open Camera, point at the QR code, tap the yellow banner
- 2. Allow the download when Safari asks
- 3. Open Settings — tap "Profile Downloaded" near the top
- (or: Settings → General → VPN & Device Management → Numa DNS)
- 4. Tap Install (top right), enter passcode, Install again
- 5. Settings → General → About → Certificate Trust Settings
- Toggle ON "Numa Local CA" — required for DoT to work
-The same QR is available in the dashboard — click “Phone Setup” in -the header and the popover renders an SVG QR code pointing at the -mobileconfig URL. On mobile viewports it shows a direct download link -instead.
-
Step 4 is non-negotiable. Even though the CA is bundled in the same -profile that installs the DNS settings, iOS still requires the user to -explicitly toggle trust in Certificate Trust Settings. It’s a deliberate -iOS policy to prevent profile-based trust injection — annoying, and -correct.
-I’ve been dogfooding this since v0.10 shipped in early April. The
-phone resolves through Numa over DoT whenever I’m home; persistent
-connections are visible in the log as a single source port living
-through dozens of queries. The one real caveat: if the laptop’s LAN IP
-changes, the profile breaks. RFC 9462 DDR
-fixes that — Numa can respond to _dns.resolver.arpa IN SVCB
-with its current IP and iOS picks it up on each network join. Next piece
-of work.
What I learned
-RFC-level small, API-level hard. RFC 7858 is ten -pages. The framing is trivial. But the subtle stuff — ALPN, timeouts, -connection caps, handshake vs idle vs write deadlines, backoff on accept -errors — isn’t in the RFC. Miss any of it and you leak a DoS vector or a -protocol confusion hole.
-Your test matrix is your security matrix. Both bugs -in this post were hidden by lenient clients. In both cases the strict -client — kdig, or a specific config combination — surfaced the bug -instantly. Pick test tools for strictness, not convenience. The moment -you find yourself thinking “but iOS accepts it,” stop and run kdig.
-Don’t initialize global state via side effects.
-“Module A installs a global, module B silently depends on it, disabling
-A breaks B” is a bug pattern that keeps coming back. Fix: have module B
-initialize its dependency explicitly, even if it means calling an
-idempotent install_default twice. The dependency graph
-should be local and obvious.
What’s next
--
-
DoH server— shipped in v0.12.0. -POST /dns-queryaccepts RFC 8484 -wire-format queries, so Firefox/Chrome can point their built-in DoH at -Numa.
-- DoQ server (RFC 9250) — DNS over QUIC. Android 14+ -supports it natively. -
- DDR (RFC 9462) — auto-discovery via
-
_dns.resolver.arpa IN SVCB, so phones pick up a moved Numa -instance without the installed profile going stale.
-
The code is at github.com/razvandimescu/numa
-— the DoT listener is in src/dot.rs
-and the phone onboarding flow is in src/setup_phone.rs
-and src/mobileconfig.rs.
-MIT license.