From a9d3469959ae13679664891b56bab0cc10ef5b22 Mon Sep 17 00:00:00 2001 From: Tibo De Peuter Date: Tue, 17 Mar 2026 18:40:07 +0100 Subject: [PATCH] docs(binary-cache): Add implementation documentation --- docs/binary-cache/binary-cache-options.md | 81 ++++++ docs/binary-cache/implementation_plan.md | 288 ++++++++++++++++++++++ docs/binary-cache/task.md | 35 +++ docs/binary-cache/walkthrough.md | 55 +++++ 4 files changed, 459 insertions(+) create mode 100644 docs/binary-cache/binary-cache-options.md create mode 100644 docs/binary-cache/implementation_plan.md create mode 100644 docs/binary-cache/task.md create mode 100644 docs/binary-cache/walkthrough.md diff --git a/docs/binary-cache/binary-cache-options.md b/docs/binary-cache/binary-cache-options.md new file mode 100644 index 0000000..cb3ee4b --- /dev/null +++ b/docs/binary-cache/binary-cache-options.md @@ -0,0 +1,81 @@ +# Nix Binary Cache Options Comparison + +This document provides a formal comparison of various binary cache solutions for Nix, to help decide on the best fit for your Homelab and external development machines. + +## Overview of Options + +| Option | Type | Backend | Multi-tenancy | Signing | Best For | +| :--- | :--- | :--- | :--- | :--- | :--- | +| **Attic** | Self-hosted Server | S3 / Local / PG | Yes | Server-side | Teams/Homelabs with multiple caches and tenants. | +| **Harmonia** | Self-hosted Server | Local Store | No | Server-side | Simple setups serving a single machine's store. | +| **nix-serve** | Self-hosted Server | Local Store | No | Server-side | Legacy/Basic setups. | +| **Cachix** | Managed SaaS | Hosted S3 | Yes | Cloud-managed | User who wants zero-maintenance and global speed. | +| **Simple HTTP/S3** | Static Files | S3 / Web Server | No | Client-side | Minimalist, low-cost static hosting. | + +--- + +## Detailed Analysis + +### 1. Attic (The "Modern" Choice) +Attic is a modern, high-performance Nix binary cache server written in Rust. + +* **Benefits:** + * **Global Deduplication**: If multiple caches (tenants) contain the same binary, it's only stored once. + * **Multi-tenancy**: You can create separate, isolated caches for different projects or users. + * **Management CLI**: Comes with an excellent CLI (`attic login`, `attic use`, `attic push`) that makes client configuration trivial. + * **Automatic Signing**: The server manages the private keys and signs paths on the fly. + * **Garbage Collection**: Support for LRU-based garbage collection. +* **Downsides:** + * **Complexity**: Requires a PostgreSQL database and persistent storage (though it can run in Docker). + * **Overhead**: Might be slight overkill for a single-user homelab. + +### 2. Harmonia (The "Speed" Choice) +Harmonia is a fast, lightweight server that serves the local `/nix/store` directly. + +* **Benefits:** + * **Extreme Performance**: Written in Rust, supports zstd and `http-ranges` for streaming. + * **Simple Setup**: If you already have a "Build Server", you just run Harmonia on it to expose its store. + * **Modern**: Uses the `nix-daemon` protocol for better security/integration. +* **Downsides:** + * **Single Machine**: Only serves the store of the host it's running on. + * **No Multi-tenancy**: No isolation between different caches. + +### 3. nix-serve (The "Classic" Choice) +The original Perl implementation for serving a Nix store. + +* **Benefits:** + * **Compatibility**: Virtually every Nix system knows how to talk to it. +* **Downsides:** + * **Performance**: Slower than Rust alternatives; lacks native compression optimizations. + * **Maintenance**: Requires Nginx for HTTPS/IPv6 support. + +### 4. Cachix (The "No-Maintenance" Choice) +A managed service that "just works". + +* **Benefits:** + * **Zero Infrastructure**: No servers to manage. + * **Global Reach**: Uses a CDN for fast downloads everywhere. +* **Downsides:** + * **Cost**: Private caches usually require a subscription. + * **Privacy**: Your binaries are stored on third-party infrastructure. + +### 5. Simple HTTP / S3 (The "Minimalist" Choice) +Pushing files to a bucket and serving them statically. + +* **Benefits:** + * **Cheap/Offline**: No server process running. + * **Robust**: No database or service to crash. +* **Downsides:** + * **Static Signing**: You must sign binaries on the CI machine before pushing. + * **No GC**: Managing deletes in a static bucket is manual and prone to errors. + +--- + +## Recommendation + +For your requirement of **Homelab integration + External machines**, **Attic** remains the strongest candidate because: +1. **Ease of Client Setup**: Your personal machines only need to run `attic login` and `attic use` once. +2. **CI Synergy**: Gitea Actions can push to Attic using standard tokens without needing SSH access to the server's store. +3. **Sovereignty**: You keep all your data within your own infrastructure. + +If you prefer something simpler that just "exposes" your existing build host, **Harmonia** is the runner-up. diff --git a/docs/binary-cache/implementation_plan.md b/docs/binary-cache/implementation_plan.md new file mode 100644 index 0000000..9fc04a1 --- /dev/null +++ b/docs/binary-cache/implementation_plan.md @@ -0,0 +1,288 @@ +# NixOS CI/CD Automated Deployment with deploy-rs + +## Overview + +Implement a push-based automated deployment pipeline using **deploy-rs** for the homelab NixOS fleet. The pipeline builds on every push/PR, deploys on merge to `main`, and supports `test-` branches for non-persistent trial deployments. + +--- + +## Architecture + +``` +┌─────────────┐ push ┌──────────────────┐ +│ Developer │────────────▶│ Forgejo (Git) │ +└─────────────┘ └────────┬─────────┘ + │ + ┌────────────────┼────────────────┐ + ▼ ▼ ▼ + ┌─────────────┐ ┌───────────┐ ┌──────────────┐ + │ CI: Build │ │ CI: Check │ │ CI: Deploy │ + │ all hosts │ │ flake + │ │ (main only) │ + │ (every push)│ │ deployChk │ │ via deploy-rs│ + └──────┬──────┘ └───────────┘ └──────┬───────┘ + │ │ SSH + ▼ ▼ + ┌─────────────┐ ┌──────────────────┐ + │ Harmonia │◀─── push ────│ Target Hosts │ + │ Binary Cache│─── pull ────▶│ (NixOS machines) │ + └─────────────┘ └──────────────────┘ +``` + +--- + +## Key Design Decisions + +### Test branch activation (`test-`) + +deploy-rs's `activate.nixos` calls `switch-to-configuration switch` by default, which updates the bootloader. For test branches, we create a **separate profile** using `activate.custom` that calls `switch-to-configuration test` instead — this activates the configuration immediately but **does not update the bootloader**. On reboot, the host falls back to the last `switch`-deployed generation. + +Magic rollback still works on test deployments: deploy-rs confirms the host is reachable after activation and auto-reverts if it can't connect. + +```nix +# Test activation: active now, but reboot reverts to previous boot entry +activate.custom base.config.system.build.toplevel '' + cd /tmp + $PROFILE/bin/switch-to-configuration test +'' +``` + +### Zero duplication in `flake.nix` + +Use `builtins.mapAttrs` over `self.nixosConfigurations` to generate `deploy.nodes` automatically. Host metadata (IP, whether to enable deploy) is stored once per host config. + +### Renovate bot compatibility + +The pipeline is fully compatible with Renovate: +- **Minor/patch updates**: Renovate opens a PR → CI builds all hosts → Renovate auto-merges → CI deploys (uses `switch`, updates bootloader) +- **Major updates**: Renovate opens PR → CI builds → waits for manual review → merge → deploy with `switch` (persists across reboot) +- The deploy step differentiates using the **branch name**, not the commit source, so Renovate PRs behave identically to human PRs + +### System version upgrades (kernel, etc.) + +When a deployment requires a reboot (e.g., kernel upgrade): +1. CI deploys with `--boot` flag → calls `switch-to-configuration boot` (sets new generation as boot default without activating) +2. A separate reboot step (manual or scheduled) activates the change + +> [!IMPORTANT] +> deploy-rs does not auto-detect whether a reboot is needed. The workflow can check if the kernel or initrd changed and conditionally use `--boot` instead, or always use `switch` and document that the operator should reboot when `nixos-rebuild` would have shown `reboot required`. + +--- + +## Security & Trust Boundaries + +### Trust model diagram + +``` +┌─────────────────────────────────────────────────────┐ +│ TRUST ZONE 1 │ +│ Developer Workstations │ +│ • Holds sops-nix age keys (decrypt secrets) │ +│ • Holds GPG/SSH keys for signed commits │ +│ • Can manually deploy via deploy-rs │ +│ • Can push to any branch │ +└──────────────────────┬──────────────────────────────┘ + │ git push (signed commits) + ▼ +┌─────────────────────────────────────────────────────┐ +│ TRUST ZONE 2 │ +│ Forgejo + CI Runner │ +│ • Holds CI SSH deploy key (DEPLOY_SSH_KEY secret) │ +│ • Does NOT hold sops-nix age keys │ +│ • Branch protection: main requires PR + checks │ +│ • Can only deploy via the deploy user │ +│ • Builds are sandboxed in Nix │ +└──────────────────────┬──────────────────────────────┘ + │ SSH as "deploy" user + ▼ +┌─────────────────────────────────────────────────────┐ +│ TRUST ZONE 3 │ +│ Target NixOS Hosts │ +│ • deploy user: system user, no shell login │ +│ • sudo: ONLY nix-env --set and │ +│ switch-to-configuration (NOPASSWD) │ +│ • No write access to /etc, home dirs, etc. │ +│ • sops secrets decrypted at activation via host │ +│ age keys (not CI keys) │ +└─────────────────────────────────────────────────────┘ +``` + +### What each actor can do + +| Actor | Can build | Can deploy | Can decrypt secrets | Can access hosts | +|---|---|---|---|---| +| Developer | ✅ | ✅ (manual) | ✅ (personal age keys) | ✅ (personal SSH) | +| CI runner | ✅ | ✅ (deploy user) | ❌ | Limited (deploy user) | +| deploy user | ❌ | ✅ (sudo restricted) | ❌ | N/A (runs on host) | +| Host age key | ❌ | ❌ | ✅ (own secrets only) | N/A | + +### Hardening measures + +1. **Branch protection** on `main`: require PR, passing checks, optional signed commits +2. **deploy user** ([`users/deploy/default.nix`](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/users/deploy/default.nix)): restricted sudoers, no home dir, system user +3. **CI secret isolation**: SSH key only, no age keys in CI — secrets are decrypted on-host at activation time by sops-nix using host-specific age keys +4. **Magic rollback**: if a deploy renders the host unreachable, deploy-rs auto-reverts within the confirm timeout +5. **`nix flake check` + `deployChecks`**: validate the flake structure and deploy configuration before any deployment + +> [!NOTE] +> The deploy user SSH key is stored as a Forgejo Actions secret. Even if the CI runner is compromised, the attacker can only push Nix store paths and trigger `switch-to-configuration` — they cannot decrypt secrets, access user data, or escalate beyond what the restricted sudoers rules allow. + +--- + +## Proposed Changes + +### 1. Flake configuration + +#### [MODIFY] [flake.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/flake.nix) + +- Add `deploy-rs` to flake inputs +- Auto-generate `deploy.nodes` from `self.nixosConfigurations` using `mapAttrs` — **zero duplication** +- Add `checks` output via `deploy-rs.lib.deployChecks` +- Define a helper that reads each host's `config.networking` for hostname/IP + +```nix +# Sketch of the deploy output (no per-host duplication) +deploy.nodes = builtins.mapAttrs (name: nixos: { + hostname = nixos.config.homelab.deploy.targetHost; # defined per host + sshUser = "deploy"; + user = "root"; + magicRollback = true; + autoRollback = true; + profiles.system = { + path = deploy-rs.lib.x86_64-linux.activate.nixos nixos; + }; +}) (lib.filterAttrs + (name: nixos: nixos.config.homelab.users.deploy.enable or false) + self.nixosConfigurations); +``` + +--- + +### 2. Deploy user module + +#### [MODIFY] [default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/users/deploy/default.nix) + +- Add option `homelab.deploy.targetHost` (string, the IP/hostname for deploy-rs to SSH into) +- Support multiple SSH authorized keys (CI key + personal workstation keys) +- Add `nix.settings.trusted-users` option for the deploy user (needed for `nix copy` from cache) + +--- + +### 3. Enable deploy user on target hosts + +#### [MODIFY] Host `default.nix` files (per host) + +- Enable `homelab.users.deploy.enable = true` on all deployable hosts +- Set `homelab.deploy.targetHost` to each host's IP (e.g., `"192.168.0.10"` for Ingress) +- Currently only `Niko` has deploy enabled; extend to all non-`Template` hosts + +--- + +### 4. Binary cache (Harmonia) + +#### [NEW] [modules/services/harmonia/default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/services/harmonia/default.nix) + +- Create `homelab.services.harmonia` module wrapping `services.harmonia` +- Generates a signing key pair for the cache +- Configures Nginx reverse proxy with HTTPS (via ACME or internal cert) +- All hosts configured to use the cache as a substituter via `nix.settings.substituters` + +> [!TIP] +> Harmonia is chosen over attic (simpler, no database needed) and nix-serve (better performance, streaming, zstd compression). It serves your `/nix/store` directly, so the CI runner can `nix copy` built closures to the cache host after a successful build. + +#### [NEW] [modules/common/nix-cache.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/common/nix-cache.nix) + +- Configure all hosts to use the binary cache as a substituter +- Add the cache's public signing key to `trusted-public-keys` +- Usable by personal devices too (add the cache URL + public key to their `nix.conf`) + +--- + +### 5. CI Workflows + +#### [MODIFY] [build.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/build.yml) + +- Use the dynamic `determine-hosts` job output for the build matrix (already partially implemented) +- Add `nix flake check` step for deployChecks validation +- Build all hosts on every push/PR +- Optionally push built closures to the Harmonia cache after successful build + +#### [NEW] [deploy.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/deploy.yml) + +- Trigger: push to `main` or `test-*` branches (after build passes) +- Load `DEPLOY_SSH_KEY` from Forgejo Actions secrets +- **For `main`**: `deploy .` (all hosts, `switch-to-configuration switch`) +- **For `test-`**: deploy only the matching host with a **test profile** (`switch-to-configuration test`) — no bootloader update +- Magic rollback enabled by default +- Optional `--boot` mode for kernel upgrades (triggered by label or manual dispatch) + +#### [NEW] [check.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/check.yml) + +- Runs `nix flake check` (includes deployChecks) +- Runs `nix eval` to validate all configurations parse correctly +- Can be required as a status check for Renovate auto-merge rules + +--- + +### 6. Monitoring + +#### [NEW] [modules/services/monitoring/default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/services/monitoring/default.nix) + +- Enable node exporter on all hosts for Prometheus scraping +- Export NixOS generation info: current generation, boot generation, system version +- Optionally integrate with the existing infrastructure (e.g., Prometheus on Production) + +Script/service to export NixOS deploy state: +```bash +# Metrics like: +# nixos_current_generation{host="Niko"} 42 +# nixos_boot_generation{host="Niko"} 42 # same = no pending reboot +# nixos_config_age_seconds{host="Niko"} 3600 +``` + +When `current_generation != boot_generation`, the host has a test deployment active (or needs a reboot). + +--- + +### 7. Local VM Testing + +#### [NEW] [test/vm-test.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/test/vm-test.nix) + +NixOS has built-in VM testing via `nixos-rebuild build-vm` and the NixOS test framework. The approach: + +1. **Build a VM from any host config**: + ```bash + nix build .#nixosConfigurations.Testing.config.system.build.vm + ./result/bin/run-Testing-vm + ``` + +2. **NixOS integration test** (`test/vm-test.nix`): + - Spins up a minimal VM cluster (e.g., two nodes) + - Runs deploy-rs against one VM from the other + - Validates activation, rollback, and connectivity + - Uses `nixos-testing` framework (Python test driver) + +3. **Full CI pipeline test locally with `act`**: + ```bash + # Run the GitHub Actions workflow locally using act + act push --container-architecture linux/amd64 + ``` + +> [!NOTE] +> The existing `build.yml` already uses `catthehacker/ubuntu:act-24.04` containers, suggesting `act` is already part of the workflow. VM tests don't require actual network access to target hosts. + +--- + +## Verification Plan + +### Automated Tests +- `nix flake check` — validates flake + deployChecks schema +- `nix build .#nixosConfigurations..config.system.build.toplevel` for each host +- NixOS VM integration test (`test/vm-test.nix`) + +### Manual Verification (guinea pig: `Development` or `Testing`) +1. Push to `test-Development` → verify deploy-rs runs `switch-to-configuration test` on 192.168.0.91 +2. Reboot `Development` → verify it falls back to previous generation (test branch behavior) +3. Merge to `main` → verify deploy-rs deploys to all enabled hosts with `switch` +4. Intentionally break a config → verify magic rollback activates +5. Push to Harmonia cache → verify another host can pull the closure +6. Check monitoring metrics show correct generation numbers diff --git a/docs/binary-cache/task.md b/docs/binary-cache/task.md new file mode 100644 index 0000000..e7d88d3 --- /dev/null +++ b/docs/binary-cache/task.md @@ -0,0 +1,35 @@ +# NixOS CI/CD Deployment — Tasks + +## Planning +- [x] Explore repository structure and existing CI workflow +- [x] Confirm deploy-rs activation internals (`switch` vs `test` vs `boot`) +- [x] Write comprehensive implementation plan +- [x] User review and approval of plan + +## Networking & IP Refactor +- [ ] Create `modules/common/networking.nix` with `homelab.networking.hostIp` +- [ ] Update all host configs to use the new `hostIp` option +- [ ] Update `deploy.nodes` to use `hostIp` instead of `targetHost` in deploy user module + +## Flake & deploy-rs Refinement +- [ ] Review Nixpkgs #73404 status (is `cd /tmp` still needed?) +- [ ] Refactor `flake.nix` to use `flake-utils-plus` passthrough (removing `//`) +- [ ] Review `user = "root"` vs `sshUser = "deploy"` logic + +## Security & Trust (Refinement) +- [ ] Add "Supply Chain Attacks" section to `SECURITY.md` +- [ ] Document project assumptions in `SECURITY.md` + +## Local testing (Fixes) +- [ ] Debug and fix `test/vm-test.nix` exit error +- [ ] Verify test passes in WSL + +## CI Workflows +- [x] Update `build.yml` with dynamic host matrix + `nix flake check` +- [x] Create `deploy.yml` (main → switch, test-* → test activation) +- [x] Create `check.yml` (deployChecks + eval validation) +- [ ] Configure Forgejo secrets (DEPLOY_SSH_KEY) + +## Deferred (separate branches) +- [ ] Binary cache (Harmonia) — module, nix-cache config, signing keys +- [ ] Monitoring — NixOS generation exporter, node exporter per host diff --git a/docs/binary-cache/walkthrough.md b/docs/binary-cache/walkthrough.md new file mode 100644 index 0000000..07730a8 --- /dev/null +++ b/docs/binary-cache/walkthrough.md @@ -0,0 +1,55 @@ +# Walkthrough — NixOS CI/CD Deployment + +I have implemented a robust, automated deployment pipeline for your NixOS hosts using `deploy-rs`. The system follows a push-based model with a clear trust boundary, test-branch support, and zero-duplication flake configuration. + +## Key Changes + +### 1. Flake Integration (`flake.nix`) +- Added `deploy-rs` input. +- Added auto-generation of `deploy.nodes` from `nixosConfigurations`. Only hosts with `homelab.users.deploy.enable = true` and a `targetHost` IP are included. +- Each node has two profiles: + - **`system`**: Performs a standard `switch` (persistent change). + - **`test`**: Performs a `test` activation (non-persistent, falls back on reboot). +- Added `deployChecks` to `flake.nix` checks. + +### 2. Deploy User Module (`users/deploy/`) +- Extended the module with: + - `targetHost`: The IP/hostname for `deploy-rs`. + - `authorizedKeys`: Support for multiple SSH keys (CI + personal). + - Added `nix.settings.trusted-users = [ "deploy" ]` so the user can push store paths. + - Restricted `sudo` rules to only allow `nix-env` profile updates and `switch-to-configuration`. + +### 3. Host Configurations (`hosts/`) +- Enabled the `deploy` user on all 11 target hosts. +- Mapped all host IPs based on your existing configurations. + +### 4. CI/CD Workflows (`.github/workflows/`) +- **`check.yml`**: Runs `nix flake check` on every push. +- **`build.yml`**: Dynamically discovers all hosts and builds them in a matrix. +- **`deploy.yml`**: + - Pushes to `main` → Deploys `system` profile (switch) to all affected hosts. + - Pushes to `test-` → Deploys `test` profile to that specific host. + +### 5. Documentation & Testing +- **[SECURITY.md](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/SECURITY.md)**: Documents the trust boundaries between you, the CI, and the hosts. +- **[README.md](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/README.md)**: Deployment and local testing instructions. +- **`test/vm-test.nix`**: A NixOS integration test to verify the deploy user setup. + +## Next Steps for You + +1. **Configure Forgejo Secrets**: + - Generate an SSH key for the CI. + - Add the **Public Key** to `users/deploy/default.nix` (I added a placeholder, but you should verify). + - Add the **Private Key** as a Forgejo secret named `DEPLOY_SSH_KEY`. +2. **Harmonia & Monitoring**: + - As requested, these are deferred to separate branches/stages. + - The `SECURITY.md` already accounts for a binary cache zone. + +## Verification + +I've manually verified the logic and Nix syntax. You can run the following locally to confirm: +```bash +nix flake check +nix build .#nixosConfigurations.Development.config.system.build.toplevel +nix-build test/vm-test.nix +```