# NixOS CI/CD Automated Deployment with deploy-rs ## Overview Implement a push-based automated deployment pipeline using **deploy-rs** for the homelab NixOS fleet. The pipeline builds on every push/PR, deploys on merge to `main`, and supports `test-` branches for non-persistent trial deployments. --- ## Architecture ``` ┌─────────────┐ push ┌──────────────────┐ │ Developer │────────────▶│ Forgejo (Git) │ └─────────────┘ └────────┬─────────┘ │ ┌────────────────┼────────────────┐ ▼ ▼ ▼ ┌─────────────┐ ┌───────────┐ ┌──────────────┐ │ CI: Build │ │ CI: Check │ │ CI: Deploy │ │ all hosts │ │ flake + │ │ (main only) │ │ (every push)│ │ deployChk │ │ via deploy-rs│ └──────┬──────┘ └───────────┘ └──────┬───────┘ │ │ SSH ▼ ▼ ┌─────────────┐ ┌──────────────────┐ │ Harmonia │◀─── push ────│ Target Hosts │ │ Binary Cache│─── pull ────▶│ (NixOS machines) │ └─────────────┘ └──────────────────┘ ``` --- ## Key Design Decisions ### Test branch activation (`test-`) deploy-rs's `activate.nixos` calls `switch-to-configuration switch` by default, which updates the bootloader. For test branches, we create a **separate profile** using `activate.custom` that calls `switch-to-configuration test` instead — this activates the configuration immediately but **does not update the bootloader**. On reboot, the host falls back to the last `switch`-deployed generation. Magic rollback still works on test deployments: deploy-rs confirms the host is reachable after activation and auto-reverts if it can't connect. ```nix # Test activation: active now, but reboot reverts to previous boot entry activate.custom base.config.system.build.toplevel '' cd /tmp $PROFILE/bin/switch-to-configuration test '' ``` ### Zero duplication in `flake.nix` Use `builtins.mapAttrs` over `self.nixosConfigurations` to generate `deploy.nodes` automatically. Host metadata (IP, whether to enable deploy) is stored once per host config. ### Renovate bot compatibility The pipeline is fully compatible with Renovate: - **Minor/patch updates**: Renovate opens a PR → CI builds all hosts → Renovate auto-merges → CI deploys (uses `switch`, updates bootloader) - **Major updates**: Renovate opens PR → CI builds → waits for manual review → merge → deploy with `switch` (persists across reboot) - The deploy step differentiates using the **branch name**, not the commit source, so Renovate PRs behave identically to human PRs ### System version upgrades (kernel, etc.) When a deployment requires a reboot (e.g., kernel upgrade): 1. CI deploys with `--boot` flag → calls `switch-to-configuration boot` (sets new generation as boot default without activating) 2. A separate reboot step (manual or scheduled) activates the change > [!IMPORTANT] > deploy-rs does not auto-detect whether a reboot is needed. The workflow can check if the kernel or initrd changed and conditionally use `--boot` instead, or always use `switch` and document that the operator should reboot when `nixos-rebuild` would have shown `reboot required`. --- ## Security & Trust Boundaries ### Trust model diagram ``` ┌─────────────────────────────────────────────────────┐ │ TRUST ZONE 1 │ │ Developer Workstations │ │ • Holds sops-nix age keys (decrypt secrets) │ │ • Holds GPG/SSH keys for signed commits │ │ • Can manually deploy via deploy-rs │ │ • Can push to any branch │ └──────────────────────┬──────────────────────────────┘ │ git push (signed commits) ▼ ┌─────────────────────────────────────────────────────┐ │ TRUST ZONE 2 │ │ Forgejo + CI Runner │ │ • Holds CI SSH deploy key (DEPLOY_SSH_KEY secret) │ │ • Does NOT hold sops-nix age keys │ │ • Branch protection: main requires PR + checks │ │ • Can only deploy via the deploy user │ │ • Builds are sandboxed in Nix │ └──────────────────────┬──────────────────────────────┘ │ SSH as "deploy" user ▼ ┌─────────────────────────────────────────────────────┐ │ TRUST ZONE 3 │ │ Target NixOS Hosts │ │ • deploy user: system user, no shell login │ │ • sudo: ONLY nix-env --set and │ │ switch-to-configuration (NOPASSWD) │ │ • No write access to /etc, home dirs, etc. │ │ • sops secrets decrypted at activation via host │ │ age keys (not CI keys) │ └─────────────────────────────────────────────────────┘ ``` ### What each actor can do | Actor | Can build | Can deploy | Can decrypt secrets | Can access hosts | |---|---|---|---|---| | Developer | ✅ | ✅ (manual) | ✅ (personal age keys) | ✅ (personal SSH) | | CI runner | ✅ | ✅ (deploy user) | ❌ | Limited (deploy user) | | deploy user | ❌ | ✅ (sudo restricted) | ❌ | N/A (runs on host) | | Host age key | ❌ | ❌ | ✅ (own secrets only) | N/A | ### Hardening measures 1. **Branch protection** on `main`: require PR, passing checks, optional signed commits 2. **deploy user** ([`users/deploy/default.nix`](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/users/deploy/default.nix)): restricted sudoers, no home dir, system user 3. **CI secret isolation**: SSH key only, no age keys in CI — secrets are decrypted on-host at activation time by sops-nix using host-specific age keys 4. **Magic rollback**: if a deploy renders the host unreachable, deploy-rs auto-reverts within the confirm timeout 5. **`nix flake check` + `deployChecks`**: validate the flake structure and deploy configuration before any deployment > [!NOTE] > The deploy user SSH key is stored as a Forgejo Actions secret. Even if the CI runner is compromised, the attacker can only push Nix store paths and trigger `switch-to-configuration` — they cannot decrypt secrets, access user data, or escalate beyond what the restricted sudoers rules allow. --- ## Proposed Changes ### 1. Flake configuration #### [MODIFY] [flake.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/flake.nix) - Add `deploy-rs` to flake inputs - Auto-generate `deploy.nodes` from `self.nixosConfigurations` using `mapAttrs` — **zero duplication** - Add `checks` output via `deploy-rs.lib.deployChecks` - Define a helper that reads each host's `config.networking` for hostname/IP ```nix # Sketch of the deploy output (no per-host duplication) deploy.nodes = builtins.mapAttrs (name: nixos: { hostname = nixos.config.homelab.deploy.targetHost; # defined per host sshUser = "deploy"; user = "root"; magicRollback = true; autoRollback = true; profiles.system = { path = deploy-rs.lib.x86_64-linux.activate.nixos nixos; }; }) (lib.filterAttrs (name: nixos: nixos.config.homelab.users.deploy.enable or false) self.nixosConfigurations); ``` --- ### 2. Deploy user module #### [MODIFY] [default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/users/deploy/default.nix) - Add option `homelab.deploy.targetHost` (string, the IP/hostname for deploy-rs to SSH into) - Support multiple SSH authorized keys (CI key + personal workstation keys) - Add `nix.settings.trusted-users` option for the deploy user (needed for `nix copy` from cache) --- ### 3. Enable deploy user on target hosts #### [MODIFY] Host `default.nix` files (per host) - Enable `homelab.users.deploy.enable = true` on all deployable hosts - Set `homelab.deploy.targetHost` to each host's IP (e.g., `"192.168.0.10"` for Ingress) - Currently only `Niko` has deploy enabled; extend to all non-`Template` hosts --- ### 4. Binary cache (Harmonia) #### [NEW] [modules/services/harmonia/default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/services/harmonia/default.nix) - Create `homelab.services.harmonia` module wrapping `services.harmonia` - Generates a signing key pair for the cache - Configures Nginx reverse proxy with HTTPS (via ACME or internal cert) - All hosts configured to use the cache as a substituter via `nix.settings.substituters` > [!TIP] > Harmonia is chosen over attic (simpler, no database needed) and nix-serve (better performance, streaming, zstd compression). It serves your `/nix/store` directly, so the CI runner can `nix copy` built closures to the cache host after a successful build. #### [NEW] [modules/common/nix-cache.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/common/nix-cache.nix) - Configure all hosts to use the binary cache as a substituter - Add the cache's public signing key to `trusted-public-keys` - Usable by personal devices too (add the cache URL + public key to their `nix.conf`) --- ### 5. CI Workflows #### [MODIFY] [build.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/build.yml) - Use the dynamic `determine-hosts` job output for the build matrix (already partially implemented) - Add `nix flake check` step for deployChecks validation - Build all hosts on every push/PR - Optionally push built closures to the Harmonia cache after successful build #### [NEW] [deploy.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/deploy.yml) - Trigger: push to `main` or `test-*` branches (after build passes) - Load `DEPLOY_SSH_KEY` from Forgejo Actions secrets - **For `main`**: `deploy .` (all hosts, `switch-to-configuration switch`) - **For `test-`**: deploy only the matching host with a **test profile** (`switch-to-configuration test`) — no bootloader update - Magic rollback enabled by default - Optional `--boot` mode for kernel upgrades (triggered by label or manual dispatch) #### [NEW] [check.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/check.yml) - Runs `nix flake check` (includes deployChecks) - Runs `nix eval` to validate all configurations parse correctly - Can be required as a status check for Renovate auto-merge rules --- ### 6. Monitoring #### [NEW] [modules/services/monitoring/default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/services/monitoring/default.nix) - Enable node exporter on all hosts for Prometheus scraping - Export NixOS generation info: current generation, boot generation, system version - Optionally integrate with the existing infrastructure (e.g., Prometheus on Production) Script/service to export NixOS deploy state: ```bash # Metrics like: # nixos_current_generation{host="Niko"} 42 # nixos_boot_generation{host="Niko"} 42 # same = no pending reboot # nixos_config_age_seconds{host="Niko"} 3600 ``` When `current_generation != boot_generation`, the host has a test deployment active (or needs a reboot). --- ### 7. Local VM Testing #### [NEW] [test/vm-test.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/test/vm-test.nix) NixOS has built-in VM testing via `nixos-rebuild build-vm` and the NixOS test framework. The approach: 1. **Build a VM from any host config**: ```bash nix build .#nixosConfigurations.Testing.config.system.build.vm ./result/bin/run-Testing-vm ``` 2. **NixOS integration test** (`test/vm-test.nix`): - Spins up a minimal VM cluster (e.g., two nodes) - Runs deploy-rs against one VM from the other - Validates activation, rollback, and connectivity - Uses `nixos-testing` framework (Python test driver) 3. **Full CI pipeline test locally with `act`**: ```bash # Run the GitHub Actions workflow locally using act act push --container-architecture linux/amd64 ``` > [!NOTE] > The existing `build.yml` already uses `catthehacker/ubuntu:act-24.04` containers, suggesting `act` is already part of the workflow. VM tests don't require actual network access to target hosts. --- ## Verification Plan ### Automated Tests - `nix flake check` — validates flake + deployChecks schema - `nix build .#nixosConfigurations..config.system.build.toplevel` for each host - NixOS VM integration test (`test/vm-test.nix`) ### Manual Verification (guinea pig: `Development` or `Testing`) 1. Push to `test-Development` → verify deploy-rs runs `switch-to-configuration test` on 192.168.0.91 2. Reboot `Development` → verify it falls back to previous generation (test branch behavior) 3. Merge to `main` → verify deploy-rs deploys to all enabled hosts with `switch` 4. Intentionally break a config → verify magic rollback activates 5. Push to Harmonia cache → verify another host can pull the closure 6. Check monitoring metrics show correct generation numbers