feat(windows): run as a real SCM service, not a Run-key autostart #107

Merged
razvandimescu merged 16 commits from feat/windows-service into main 2026-04-17 07:02:43 +08:00
razvandimescu commented 2026-04-16 04:53:00 +08:00 (Migrated from github.com)

Summary

  • Replaces the HKLM\…\Run login-time autostart with a real Windows service registered via sc create. Boot-time, SYSTEM context, auto-restart on crash (5s/5s/10s). Fixes the "every Windows debug session is blind" problem that stalled #102's tester cycle — the service now has a lifecycle Get-Service can see, instead of a user-session process with no stderr sink.
  • Extracts main.rs's inline server body (~500 lines) into numa::serve::run(config_path) so both the interactive CLI and the Windows service dispatcher drive the same startup loop. main.rs is now a thin sync dispatcher; each subcommand owns its own tokio runtime.
  • Adds src/windows_service.rs wrapping Mullvad's windows-service crate (Windows-only dep, feature-gated — zero impact on macOS/Linux binaries). ServiceMain spawns tokio on a dedicated thread, reports Running, handles Stop/Shutdown via recv_timeout, reports Stopped.
  • New stable install path: %PROGRAMDATA%\numa\bin\numa.exe (the user's Downloads folder is not a durable binPath for SCM). install_service_binary copies into place idempotently; uninstall_windows stops + deletes the service and removes the binary before restoring DNS.
  • Keeps existing numa install / numa uninstall / numa service start|stop|restart|status CLI shape — users see the same commands, only the underlying mechanism changes.

Intentionally out of scope (separate PRs):

  • netsh destructive set dnsservers static bug (clears interface DNS on probe failure)
  • IPv6 DNS leak (only IPv4 is configured)
  • Uninstall secondary-server drop (let _ on netsh add)
  • Dnscache disable / reboot requirement (needs empirical verification on fresh Win11)
  • File logging sink for service mode (next-highest priority once this ships)

Test plan

  • make all — fmt + clippy + audit + 285 lib tests, all pass
  • Debian Bookworm container build (rust:1.94-bookworm) — binary builds clean with the refactor
  • Debian Bookworm + systemd end-to-end: numa installsystemctl is-active numa = active (running), dig @127.0.0.1 google.com resolves correctly, journal shows new numa::serve:: / numa::ctx:: module paths
  • numa uninstall round-trip on Linux — resolv.conf restored from backup, service unit removed, DNS still works
  • Interactive numa on macOS — binary starts, serves on loopback, stats API responds
  • Windows SCM end-to-end — cannot test on macOS/Linux. Requires manual validation on a Windows box with the CI-built artifact: numa install (elevated) → reboot → Get-Service Numa shows Running → dig @127.0.0.1 google.com resolves → numa uninstall cleans up.
  • macOS launchd re-install — unchanged code path, logically symmetric to the verified Linux systemd flow; skipping to avoid disrupting my daily-driver numa. Optional make deploy on a fresh macOS box would cover it.
## Summary - Replaces the `HKLM\…\Run` login-time autostart with a real Windows service registered via `sc create`. Boot-time, SYSTEM context, auto-restart on crash (5s/5s/10s). Fixes the "every Windows debug session is blind" problem that stalled #102's tester cycle — the service now has a lifecycle `Get-Service` can see, instead of a user-session process with no stderr sink. - Extracts `main.rs`'s inline server body (~500 lines) into `numa::serve::run(config_path)` so both the interactive CLI and the Windows service dispatcher drive the same startup loop. `main.rs` is now a thin sync dispatcher; each subcommand owns its own tokio runtime. - Adds `src/windows_service.rs` wrapping Mullvad's `windows-service` crate (Windows-only dep, feature-gated — zero impact on macOS/Linux binaries). ServiceMain spawns tokio on a dedicated thread, reports `Running`, handles Stop/Shutdown via `recv_timeout`, reports `Stopped`. - New stable install path: `%PROGRAMDATA%\numa\bin\numa.exe` (the user's Downloads folder is not a durable binPath for SCM). `install_service_binary` copies into place idempotently; `uninstall_windows` stops + deletes the service and removes the binary before restoring DNS. - Keeps existing `numa install` / `numa uninstall` / `numa service start|stop|restart|status` CLI shape — users see the same commands, only the underlying mechanism changes. Intentionally out of scope (separate PRs): - netsh destructive `set dnsservers static` bug (clears interface DNS on probe failure) - IPv6 DNS leak (only IPv4 is configured) - Uninstall secondary-server drop (`let _` on `netsh add`) - Dnscache disable / reboot requirement (needs empirical verification on fresh Win11) - File logging sink for service mode (next-highest priority once this ships) ## Test plan - [x] `make all` — fmt + clippy + audit + 285 lib tests, all pass - [x] Debian Bookworm container build (`rust:1.94-bookworm`) — binary builds clean with the refactor - [x] **Debian Bookworm + systemd end-to-end**: `numa install` → `systemctl is-active numa` = `active (running)`, `dig @127.0.0.1 google.com` resolves correctly, journal shows new `numa::serve::` / `numa::ctx::` module paths - [x] `numa uninstall` round-trip on Linux — resolv.conf restored from backup, service unit removed, DNS still works - [x] Interactive `numa` on macOS — binary starts, serves on loopback, stats API responds - [x] **Windows SCM end-to-end** — cannot test on macOS/Linux. Requires manual validation on a Windows box with the CI-built artifact: `numa install` (elevated) → reboot → `Get-Service Numa` shows Running → `dig @127.0.0.1 google.com` resolves → `numa uninstall` cleans up. - [x] macOS launchd re-install — unchanged code path, logically symmetric to the verified Linux systemd flow; skipping to avoid disrupting my daily-driver numa. Optional `make deploy` on a fresh macOS box would cover it.
Sign in to join this conversation.