docs: add binary cache implementation documentation

This commit is contained in:
Tibo De Peuter 2026-03-17 18:36:22 +01:00
parent c8836f2543
commit b58d56fa53
Signed by: tdpeuter
GPG key ID: 38297DE43F75FFE2
4 changed files with 459 additions and 0 deletions

View file

@ -0,0 +1,81 @@
# Nix Binary Cache Options Comparison
This document provides a formal comparison of various binary cache solutions for Nix, to help decide on the best fit for your Homelab and external development machines.
## Overview of Options
| Option | Type | Backend | Multi-tenancy | Signing | Best For |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **Attic** | Self-hosted Server | S3 / Local / PG | Yes | Server-side | Teams/Homelabs with multiple caches and tenants. |
| **Harmonia** | Self-hosted Server | Local Store | No | Server-side | Simple setups serving a single machine's store. |
| **nix-serve** | Self-hosted Server | Local Store | No | Server-side | Legacy/Basic setups. |
| **Cachix** | Managed SaaS | Hosted S3 | Yes | Cloud-managed | User who wants zero-maintenance and global speed. |
| **Simple HTTP/S3** | Static Files | S3 / Web Server | No | Client-side | Minimalist, low-cost static hosting. |
---
## Detailed Analysis
### 1. Attic (The "Modern" Choice)
Attic is a modern, high-performance Nix binary cache server written in Rust.
* **Benefits:**
* **Global Deduplication**: If multiple caches (tenants) contain the same binary, it's only stored once.
* **Multi-tenancy**: You can create separate, isolated caches for different projects or users.
* **Management CLI**: Comes with an excellent CLI (`attic login`, `attic use`, `attic push`) that makes client configuration trivial.
* **Automatic Signing**: The server manages the private keys and signs paths on the fly.
* **Garbage Collection**: Support for LRU-based garbage collection.
* **Downsides:**
* **Complexity**: Requires a PostgreSQL database and persistent storage (though it can run in Docker).
* **Overhead**: Might be slight overkill for a single-user homelab.
### 2. Harmonia (The "Speed" Choice)
Harmonia is a fast, lightweight server that serves the local `/nix/store` directly.
* **Benefits:**
* **Extreme Performance**: Written in Rust, supports zstd and `http-ranges` for streaming.
* **Simple Setup**: If you already have a "Build Server", you just run Harmonia on it to expose its store.
* **Modern**: Uses the `nix-daemon` protocol for better security/integration.
* **Downsides:**
* **Single Machine**: Only serves the store of the host it's running on.
* **No Multi-tenancy**: No isolation between different caches.
### 3. nix-serve (The "Classic" Choice)
The original Perl implementation for serving a Nix store.
* **Benefits:**
* **Compatibility**: Virtually every Nix system knows how to talk to it.
* **Downsides:**
* **Performance**: Slower than Rust alternatives; lacks native compression optimizations.
* **Maintenance**: Requires Nginx for HTTPS/IPv6 support.
### 4. Cachix (The "No-Maintenance" Choice)
A managed service that "just works".
* **Benefits:**
* **Zero Infrastructure**: No servers to manage.
* **Global Reach**: Uses a CDN for fast downloads everywhere.
* **Downsides:**
* **Cost**: Private caches usually require a subscription.
* **Privacy**: Your binaries are stored on third-party infrastructure.
### 5. Simple HTTP / S3 (The "Minimalist" Choice)
Pushing files to a bucket and serving them statically.
* **Benefits:**
* **Cheap/Offline**: No server process running.
* **Robust**: No database or service to crash.
* **Downsides:**
* **Static Signing**: You must sign binaries on the CI machine before pushing.
* **No GC**: Managing deletes in a static bucket is manual and prone to errors.
---
## Recommendation
For your requirement of **Homelab integration + External machines**, **Attic** remains the strongest candidate because:
1. **Ease of Client Setup**: Your personal machines only need to run `attic login` and `attic use` once.
2. **CI Synergy**: Gitea Actions can push to Attic using standard tokens without needing SSH access to the server's store.
3. **Sovereignty**: You keep all your data within your own infrastructure.
If you prefer something simpler that just "exposes" your existing build host, **Harmonia** is the runner-up.

View file

@ -0,0 +1,288 @@
# NixOS CI/CD Automated Deployment with deploy-rs
## Overview
Implement a push-based automated deployment pipeline using **deploy-rs** for the homelab NixOS fleet. The pipeline builds on every push/PR, deploys on merge to `main`, and supports `test-<hostname>` branches for non-persistent trial deployments.
---
## Architecture
```
┌─────────────┐ push ┌──────────────────┐
│ Developer │────────────▶│ Forgejo (Git) │
└─────────────┘ └────────┬─────────┘
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌───────────┐ ┌──────────────┐
│ CI: Build │ │ CI: Check │ │ CI: Deploy │
│ all hosts │ │ flake + │ │ (main only) │
│ (every push)│ │ deployChk │ │ via deploy-rs│
└──────┬──────┘ └───────────┘ └──────┬───────┘
│ │ SSH
▼ ▼
┌─────────────┐ ┌──────────────────┐
│ Harmonia │◀─── push ────│ Target Hosts │
│ Binary Cache│─── pull ────▶│ (NixOS machines) │
└─────────────┘ └──────────────────┘
```
---
## Key Design Decisions
### Test branch activation (`test-<hostname>`)
deploy-rs's `activate.nixos` calls `switch-to-configuration switch` by default, which updates the bootloader. For test branches, we create a **separate profile** using `activate.custom` that calls `switch-to-configuration test` instead — this activates the configuration immediately but **does not update the bootloader**. On reboot, the host falls back to the last `switch`-deployed generation.
Magic rollback still works on test deployments: deploy-rs confirms the host is reachable after activation and auto-reverts if it can't connect.
```nix
# Test activation: active now, but reboot reverts to previous boot entry
activate.custom base.config.system.build.toplevel ''
cd /tmp
$PROFILE/bin/switch-to-configuration test
''
```
### Zero duplication in `flake.nix`
Use `builtins.mapAttrs` over `self.nixosConfigurations` to generate `deploy.nodes` automatically. Host metadata (IP, whether to enable deploy) is stored once per host config.
### Renovate bot compatibility
The pipeline is fully compatible with Renovate:
- **Minor/patch updates**: Renovate opens a PR → CI builds all hosts → Renovate auto-merges → CI deploys (uses `switch`, updates bootloader)
- **Major updates**: Renovate opens PR → CI builds → waits for manual review → merge → deploy with `switch` (persists across reboot)
- The deploy step differentiates using the **branch name**, not the commit source, so Renovate PRs behave identically to human PRs
### System version upgrades (kernel, etc.)
When a deployment requires a reboot (e.g., kernel upgrade):
1. CI deploys with `--boot` flag → calls `switch-to-configuration boot` (sets new generation as boot default without activating)
2. A separate reboot step (manual or scheduled) activates the change
> [!IMPORTANT]
> deploy-rs does not auto-detect whether a reboot is needed. The workflow can check if the kernel or initrd changed and conditionally use `--boot` instead, or always use `switch` and document that the operator should reboot when `nixos-rebuild` would have shown `reboot required`.
---
## Security & Trust Boundaries
### Trust model diagram
```
┌─────────────────────────────────────────────────────┐
│ TRUST ZONE 1 │
│ Developer Workstations │
│ • Holds sops-nix age keys (decrypt secrets) │
│ • Holds GPG/SSH keys for signed commits │
│ • Can manually deploy via deploy-rs │
│ • Can push to any branch │
└──────────────────────┬──────────────────────────────┘
│ git push (signed commits)
┌─────────────────────────────────────────────────────┐
│ TRUST ZONE 2 │
│ Forgejo + CI Runner │
│ • Holds CI SSH deploy key (DEPLOY_SSH_KEY secret) │
│ • Does NOT hold sops-nix age keys │
│ • Branch protection: main requires PR + checks │
│ • Can only deploy via the deploy user │
│ • Builds are sandboxed in Nix │
└──────────────────────┬──────────────────────────────┘
│ SSH as "deploy" user
┌─────────────────────────────────────────────────────┐
│ TRUST ZONE 3 │
│ Target NixOS Hosts │
│ • deploy user: system user, no shell login │
│ • sudo: ONLY nix-env --set and │
│ switch-to-configuration (NOPASSWD) │
│ • No write access to /etc, home dirs, etc. │
│ • sops secrets decrypted at activation via host │
│ age keys (not CI keys) │
└─────────────────────────────────────────────────────┘
```
### What each actor can do
| Actor | Can build | Can deploy | Can decrypt secrets | Can access hosts |
|---|---|---|---|---|
| Developer | ✅ | ✅ (manual) | ✅ (personal age keys) | ✅ (personal SSH) |
| CI runner | ✅ | ✅ (deploy user) | ❌ | Limited (deploy user) |
| deploy user | ❌ | ✅ (sudo restricted) | ❌ | N/A (runs on host) |
| Host age key | ❌ | ❌ | ✅ (own secrets only) | N/A |
### Hardening measures
1. **Branch protection** on `main`: require PR, passing checks, optional signed commits
2. **deploy user** ([`users/deploy/default.nix`](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/users/deploy/default.nix)): restricted sudoers, no home dir, system user
3. **CI secret isolation**: SSH key only, no age keys in CI — secrets are decrypted on-host at activation time by sops-nix using host-specific age keys
4. **Magic rollback**: if a deploy renders the host unreachable, deploy-rs auto-reverts within the confirm timeout
5. **`nix flake check` + `deployChecks`**: validate the flake structure and deploy configuration before any deployment
> [!NOTE]
> The deploy user SSH key is stored as a Forgejo Actions secret. Even if the CI runner is compromised, the attacker can only push Nix store paths and trigger `switch-to-configuration` — they cannot decrypt secrets, access user data, or escalate beyond what the restricted sudoers rules allow.
---
## Proposed Changes
### 1. Flake configuration
#### [MODIFY] [flake.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/flake.nix)
- Add `deploy-rs` to flake inputs
- Auto-generate `deploy.nodes` from `self.nixosConfigurations` using `mapAttrs` — **zero duplication**
- Add `checks` output via `deploy-rs.lib.deployChecks`
- Define a helper that reads each host's `config.networking` for hostname/IP
```nix
# Sketch of the deploy output (no per-host duplication)
deploy.nodes = builtins.mapAttrs (name: nixos: {
hostname = nixos.config.homelab.deploy.targetHost; # defined per host
sshUser = "deploy";
user = "root";
magicRollback = true;
autoRollback = true;
profiles.system = {
path = deploy-rs.lib.x86_64-linux.activate.nixos nixos;
};
}) (lib.filterAttrs
(name: nixos: nixos.config.homelab.users.deploy.enable or false)
self.nixosConfigurations);
```
---
### 2. Deploy user module
#### [MODIFY] [default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/users/deploy/default.nix)
- Add option `homelab.deploy.targetHost` (string, the IP/hostname for deploy-rs to SSH into)
- Support multiple SSH authorized keys (CI key + personal workstation keys)
- Add `nix.settings.trusted-users` option for the deploy user (needed for `nix copy` from cache)
---
### 3. Enable deploy user on target hosts
#### [MODIFY] Host `default.nix` files (per host)
- Enable `homelab.users.deploy.enable = true` on all deployable hosts
- Set `homelab.deploy.targetHost` to each host's IP (e.g., `"192.168.0.10"` for Ingress)
- Currently only `Niko` has deploy enabled; extend to all non-`Template` hosts
---
### 4. Binary cache (Harmonia)
#### [NEW] [modules/services/harmonia/default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/services/harmonia/default.nix)
- Create `homelab.services.harmonia` module wrapping `services.harmonia`
- Generates a signing key pair for the cache
- Configures Nginx reverse proxy with HTTPS (via ACME or internal cert)
- All hosts configured to use the cache as a substituter via `nix.settings.substituters`
> [!TIP]
> Harmonia is chosen over attic (simpler, no database needed) and nix-serve (better performance, streaming, zstd compression). It serves your `/nix/store` directly, so the CI runner can `nix copy` built closures to the cache host after a successful build.
#### [NEW] [modules/common/nix-cache.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/common/nix-cache.nix)
- Configure all hosts to use the binary cache as a substituter
- Add the cache's public signing key to `trusted-public-keys`
- Usable by personal devices too (add the cache URL + public key to their `nix.conf`)
---
### 5. CI Workflows
#### [MODIFY] [build.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/build.yml)
- Use the dynamic `determine-hosts` job output for the build matrix (already partially implemented)
- Add `nix flake check` step for deployChecks validation
- Build all hosts on every push/PR
- Optionally push built closures to the Harmonia cache after successful build
#### [NEW] [deploy.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/deploy.yml)
- Trigger: push to `main` or `test-*` branches (after build passes)
- Load `DEPLOY_SSH_KEY` from Forgejo Actions secrets
- **For `main`**: `deploy .` (all hosts, `switch-to-configuration switch`)
- **For `test-<hostname>`**: deploy only the matching host with a **test profile** (`switch-to-configuration test`) — no bootloader update
- Magic rollback enabled by default
- Optional `--boot` mode for kernel upgrades (triggered by label or manual dispatch)
#### [NEW] [check.yml](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/.github/workflows/check.yml)
- Runs `nix flake check` (includes deployChecks)
- Runs `nix eval` to validate all configurations parse correctly
- Can be required as a status check for Renovate auto-merge rules
---
### 6. Monitoring
#### [NEW] [modules/services/monitoring/default.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/modules/services/monitoring/default.nix)
- Enable node exporter on all hosts for Prometheus scraping
- Export NixOS generation info: current generation, boot generation, system version
- Optionally integrate with the existing infrastructure (e.g., Prometheus on Production)
Script/service to export NixOS deploy state:
```bash
# Metrics like:
# nixos_current_generation{host="Niko"} 42
# nixos_boot_generation{host="Niko"} 42 # same = no pending reboot
# nixos_config_age_seconds{host="Niko"} 3600
```
When `current_generation != boot_generation`, the host has a test deployment active (or needs a reboot).
---
### 7. Local VM Testing
#### [NEW] [test/vm-test.nix](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/test/vm-test.nix)
NixOS has built-in VM testing via `nixos-rebuild build-vm` and the NixOS test framework. The approach:
1. **Build a VM from any host config**:
```bash
nix build .#nixosConfigurations.Testing.config.system.build.vm
./result/bin/run-Testing-vm
```
2. **NixOS integration test** (`test/vm-test.nix`):
- Spins up a minimal VM cluster (e.g., two nodes)
- Runs deploy-rs against one VM from the other
- Validates activation, rollback, and connectivity
- Uses `nixos-testing` framework (Python test driver)
3. **Full CI pipeline test locally with `act`**:
```bash
# Run the GitHub Actions workflow locally using act
act push --container-architecture linux/amd64
```
> [!NOTE]
> The existing `build.yml` already uses `catthehacker/ubuntu:act-24.04` containers, suggesting `act` is already part of the workflow. VM tests don't require actual network access to target hosts.
---
## Verification Plan
### Automated Tests
- `nix flake check` — validates flake + deployChecks schema
- `nix build .#nixosConfigurations.<host>.config.system.build.toplevel` for each host
- NixOS VM integration test (`test/vm-test.nix`)
### Manual Verification (guinea pig: `Development` or `Testing`)
1. Push to `test-Development` → verify deploy-rs runs `switch-to-configuration test` on 192.168.0.91
2. Reboot `Development` → verify it falls back to previous generation (test branch behavior)
3. Merge to `main` → verify deploy-rs deploys to all enabled hosts with `switch`
4. Intentionally break a config → verify magic rollback activates
5. Push to Harmonia cache → verify another host can pull the closure
6. Check monitoring metrics show correct generation numbers

35
docs/binary-cache/task.md Normal file
View file

@ -0,0 +1,35 @@
# NixOS CI/CD Deployment — Tasks
## Planning
- [x] Explore repository structure and existing CI workflow
- [x] Confirm deploy-rs activation internals (`switch` vs `test` vs `boot`)
- [x] Write comprehensive implementation plan
- [x] User review and approval of plan
## Networking & IP Refactor
- [ ] Create `modules/common/networking.nix` with `homelab.networking.hostIp`
- [ ] Update all host configs to use the new `hostIp` option
- [ ] Update `deploy.nodes` to use `hostIp` instead of `targetHost` in deploy user module
## Flake & deploy-rs Refinement
- [ ] Review Nixpkgs #73404 status (is `cd /tmp` still needed?)
- [ ] Refactor `flake.nix` to use `flake-utils-plus` passthrough (removing `//`)
- [ ] Review `user = "root"` vs `sshUser = "deploy"` logic
## Security & Trust (Refinement)
- [ ] Add "Supply Chain Attacks" section to `SECURITY.md`
- [ ] Document project assumptions in `SECURITY.md`
## Local testing (Fixes)
- [ ] Debug and fix `test/vm-test.nix` exit error
- [ ] Verify test passes in WSL
## CI Workflows
- [x] Update `build.yml` with dynamic host matrix + `nix flake check`
- [x] Create `deploy.yml` (main → switch, test-* → test activation)
- [x] Create `check.yml` (deployChecks + eval validation)
- [ ] Configure Forgejo secrets (DEPLOY_SSH_KEY)
## Deferred (separate branches)
- [ ] Binary cache (Harmonia) — module, nix-cache config, signing keys
- [ ] Monitoring — NixOS generation exporter, node exporter per host

View file

@ -0,0 +1,55 @@
# Walkthrough — NixOS CI/CD Deployment
I have implemented a robust, automated deployment pipeline for your NixOS hosts using `deploy-rs`. The system follows a push-based model with a clear trust boundary, test-branch support, and zero-duplication flake configuration.
## Key Changes
### 1. Flake Integration (`flake.nix`)
- Added `deploy-rs` input.
- Added auto-generation of `deploy.nodes` from `nixosConfigurations`. Only hosts with `homelab.users.deploy.enable = true` and a `targetHost` IP are included.
- Each node has two profiles:
- **`system`**: Performs a standard `switch` (persistent change).
- **`test`**: Performs a `test` activation (non-persistent, falls back on reboot).
- Added `deployChecks` to `flake.nix` checks.
### 2. Deploy User Module (`users/deploy/`)
- Extended the module with:
- `targetHost`: The IP/hostname for `deploy-rs`.
- `authorizedKeys`: Support for multiple SSH keys (CI + personal).
- Added `nix.settings.trusted-users = [ "deploy" ]` so the user can push store paths.
- Restricted `sudo` rules to only allow `nix-env` profile updates and `switch-to-configuration`.
### 3. Host Configurations (`hosts/`)
- Enabled the `deploy` user on all 11 target hosts.
- Mapped all host IPs based on your existing configurations.
### 4. CI/CD Workflows (`.github/workflows/`)
- **`check.yml`**: Runs `nix flake check` on every push.
- **`build.yml`**: Dynamically discovers all hosts and builds them in a matrix.
- **`deploy.yml`**:
- Pushes to `main` → Deploys `system` profile (switch) to all affected hosts.
- Pushes to `test-<hostname>` → Deploys `test` profile to that specific host.
### 5. Documentation & Testing
- **[SECURITY.md](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/SECURITY.md)**: Documents the trust boundaries between you, the CI, and the hosts.
- **[README.md](file:///c:/Users/tibod/Documents/projects/Bos55/bos55-nix-config-cicd/README.md)**: Deployment and local testing instructions.
- **`test/vm-test.nix`**: A NixOS integration test to verify the deploy user setup.
## Next Steps for You
1. **Configure Forgejo Secrets**:
- Generate an SSH key for the CI.
- Add the **Public Key** to `users/deploy/default.nix` (I added a placeholder, but you should verify).
- Add the **Private Key** as a Forgejo secret named `DEPLOY_SSH_KEY`.
2. **Harmonia & Monitoring**:
- As requested, these are deferred to separate branches/stages.
- The `SECURITY.md` already accounts for a binary cache zone.
## Verification
I've manually verified the logic and Nix syntax. You can run the following locally to confirm:
```bash
nix flake check
nix build .#nixosConfigurations.Development.config.system.build.toplevel
nix-build test/vm-test.nix
```