1 Introduction
Ambient CI is an engine for running continuous integration workloads in a safe and secure manner. This document explains the software architecture, but assumes the reader understands how to use the software. Acceptance criteria and how they are verified are documented in a separate document.
1.1 Very high level overview
The CI plan (what actions to execute for a CI run) are divided into three parts:
- the actual plan, which gets executed in a virtual machine without network access and tightly constrained resources
- the pre- and post-plans, which get executed on the host before and after the VM runs
The VM is implemented using QEMU, and is ephemeral. Each run gets a fresh new VM, a copy-on-write copy of the specified base image for the project. Ambient passed in any input files in tar archives mapped to virtual block devices. The guest in the VM packs any output files into tar archives that it writes to other virtual block devices. The VM does not mount file systems from the host to reduce risk of security problems in host file system code.
In addition, one VM serial port is used for kernel console and another for build log. Ambient captures these.
The execution in the VM is controlled by a part of Ambient called the "executor", which the VM loads from the first virtual block device, together with the list of actions to execute.
An Ambient CI run is roughly like this:
- execute pre-plan actions on the host
- pack source tree, dependencies, artifacts, and cache into tar archives
- execute plan actions in VM, with source and dependencies as read-only block devices, and artifacts and cache as read/write
- unpack cache and artifacts
- execute post-plan actions on the host
2 Design: Trusted VM (May, 2026)
(To be folded into the rest of document once the code changes are done.)
2.1 Problem statement
It is a problem that pre- and post-plan actions are executed directly on the host. It means the action implementations need to be vetted with extreme care, as the safety and security of the host depends on them not being problematic. One mistake and boom.
A problematic implementation might leak secrets, attack the host, attack other hosts, or otherwise abuse direct filesystem access on the host and unrestricted access to the network.
2.2 Suggested solution: trusted VM
An approach to mitigate and contain any problems is to execute the actions in dedicated virtual machines that have network access and only the secrets needed for the actions. These would be fresh and ephemeral, with constrained resources, the way the VM for the plan is. The only difference is network access. This approach is called the "trusted VM", where we trust the VM with network access.
This approach protects the host, but does not protect attacks over the network.
To mitigate attacks using the network, an additional layer of protection would constrain network use to only allowed use on the host side rather than inside the VM. This could be implemented using a firewall or a proxy. However, for the first stage of trusted VMs will not add the additional layer. Isolating pre- and post-plan actions in a VM is already an improvement, and implementing that will be enough work at once. A network protection layer can be added later as a separate step.
2.3 Implementation plan
-
Refactor
src/run.rsto use a medium-level typo encapsulate running QEMU in a specific way: for pre-plan, for actual plan, and for post-plan. The type will handle setting up virtual block devices using provided tar archives, setting up the runnable plan and packing into a tar, etc. Change the code to use the new time to run the current single VM. -
Change
src/run.rsto use the new type for pre- and post-plan actions.
This might be enough, but we'll see.
2.4 Problems found while implementing:
-
Context::set_plan_envallows a pre-plan action to set an environment variable for executing the actual plan. This is currently used by therustupaction to setCARGO_HOMEto/ci/deps/rustup.The mechanism is generic, but the implementation doesn't work in the trusted VM architecture, as there is no way to securely communicate changes to affect the actual plan, from within the VM. Any mechanism the executor uses, malicious code could also use. While I can't think of a way to exploit that, I see no point in assuming it can't be.
I can either drop the generic mechanism, or find a way to do it within Ambient, while the action is executed in the VM.
I could add a method to
RunnableActionthat gets called for pre-plan actions to set environment vars for actual plan execution. This would happen within Ambient, regardless of what happens in the VM. I could add a methodActionImpl::set_plan_envs, with a no-op default implementation, and call that inRunnablePlan::executeif the action succeeds. -
Post-plan actions need secrets. The current ones (
rsyncanddput) specifically need an SSH key. Possibly also a user certificate, depending on the SSH target host.There are a number of ways in which this can be resolved. Ideally, the actions could use the secret (SSH key), but it would not be exposed to the VM. This might be achieved by running an SSH agent on the host, and proxy agent in the VM. However, this would take time to develop. Also, a proxy may not work for other kinds of secrets, such as HTTP API tokens.
Instead, I will first implement a simpler solution, where the trusted VM is given another virtual drive, with an SSH key, and optionally a user certificate. The virtual drive is extracted to
/ci/secretsand SSH in the trusted VM is set up to use those.The Ambient configuration will specify which, if any, SSH keys and certificates are put in the secrets drive. For simplicity, the drive will always be there, and so will
/ci/secrets, but they may be empty.I might later add support for generating a new key and a short-lived user certificate for each run. That will be dependent on SSH credential management for the SSH target, though. Thus, Ambient would only have a configuration option to run a command to create the temporary identity.