Automated deployments #2
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I would like to implement automated (semi-automatic) deployments of the NixOS machines. This would decrease the effort for deploying changes to (all) the affected hosts, including keys management. Putting a system in place and clearly defining the trust boundaries would be beneficial in my opinion.
Using GitHub/Forgejo Actions, I have the following approach(es) in mind:
Push based system
Features/changes to the config are implemented in their separate branches. On every push, the modified derivations are built, as to check if they are valid. Once merged into the main branch, or when pushed to a
test-<hostname>branch, the changes are pushed to the host and the services are restarted. The changes would be built on a build-host, with additional resources (mainly memory).Caveats would be:
Pull based system
Features/changes to the config are implemented in their separate branches, though 'hotfixes' should be possible as well. On every push, the modified derivations can be built, as to check if they are valid, but in this case I think it is less appropriate. Once merged into the main branch, or when pushed to a
test-<hostname>branch, either a) the host periodically checks for changes (usingsystem.autoUpgrade?) and pull them in, or b) some kind of webhook system is implemented to let the host(s) know to pull the new changes. The changes would either be built on the systems themselves, as to spread the resource consumption, or be built once again on a centralized build-host.Caveats I can think of:
Discussion
In both systems, the main branch should be protected with appropriate rules. I think putting signed commits and checking for those signatures in place is appropriate as well.
In both cases, rollbacks are a challenge to implement (myself).
deploy-rsimplements a magic rollback.Documentation
Continuous integration with GitHub Actions
Setting up distributed builds
Deployment tools: colmena, deploy-rs, comin
Forgejo Actions Reference
NixOS deployment: from push to pull
Remote deployment NixOS & Flakes Book
CI/CD rebuilds via github
Proxmox Cloud Init Support
The full stack is automatically provisioned as well. When a merge defines a new VM, it should be proviosed on proxmox using the API. Is this just trying to implement Kubernetes yourself? Is it useful to do this? Also, is this useful on the modularity of VMs? Or should containers be used with beefier machines? I guess this is still useful when you need to provision multiple worker nodes that scale up and down, spin up and down.
When the pr is opened, a new vm is spinned up with a minimal ISO, I think that is the best approach.
All credentials should be put in place at the beginning. But how to handle actually unique passwords? Is it feasible to not set a password at all? Only use SSH keys for example?
I think it might be interesting to have a look at implementing this, but at what point does it become just a reimplementaiton of already existing orchestration tools?
I think it is possible to link the NixOS configuration to a VM by using the machine id and vm id in Proxmox. But how to avoid collisions?
I also thought about using test VMs to test new configuraitons of VMs without pushing them to the mchine yet. But how can these be made possible when the test VM runs on a CI/CD infrastructure? How can the developer test the VM out? Again, how to avoid collisions in state with the actual VM that is running the production services? Do datasets need to be duplicated as well?
Also, with RenovateBot etc, that will make pull requests when new container images are available, is it possible to recognize if these container images are related to security issues? Will those be major or minor updates, since we are already differentiating between those? (I think minor can be merged automatically, they should not require manual migrations, right?)
How to handle manual migrations when all the services should be deployed automatically through the CI/CD server?
How will the above configuration scale when for example introducing multiple worker nodes (aka Proxmox instances). Should we rely on the Proxmox ecosystem to automatically decide on which worker to run the VM? Is that even possible? Or should static configuration be put in place?
How easy will it be for a less-technical person to request a vm in such a system? Preferrably, my dad can write a simple file that requests a VM that he can log into and mess aroudn, while still allowing static configurations as well (such as enforcing firewall rules and proxys, dns etc) - Preferrably, this would be done through a Ubuntu (server) VM, but how will the static configratjion be possible if not NixOS?
How can monitoring be set up in this ecosystem? CAn everything be done through static configurations? I have looked into wazuh for this reason, but the package was not configured for NixOS yet. I did run into an issue earlier this month where a VM disk filled up without me noticing and everyhting slowed down because of that. I want to avoid this at all costs in the future. What is the best way to approach this? Also I need a centralized logging system asap, because of the number of services that I am running.
In this whole idea of an ecosystem, would it be possible to run test deployments of Kubenretes? Or would a separate system be better suited for that? How much manual work is needed to bootstrap the system in that case? Can it be done statically in combination with NixOS?
What if the action runners go down? What is the best way to still keep the system running and apply new updates? How bad is it if that happens and the runner runs on our own hardware (which means that the runner is configured by the actions that run on that runner).
I want to look into configuring static scanners for security as well, such as CodeRabbit for code reviews or Aikido security for bugs in the code. How effective will those be? Can I run such systems locally with AI as well? Without AI would be even better. Which parts of my data will be sent to those companies, which is something I want to avoid. Free tiers only.
How feasible is such a project like this?