1 Introduction

Ambient CI is an engine for running continuous integration workloads in a safe and secure manner. This document explains the software architecture, but assumes the reader understands how to use the software. Acceptance criteria and how they are verified are documented in a separate document.

1.1 Very high level overview

The CI plan (what actions to execute for a CI run) are divided into three parts:

the actual plan, which gets executed in a virtual machine without network access and tightly constrained resources
the pre- and post-plans, which get executed on the host before and after the VM runs

The VM is implemented using QEMU, and is ephemeral. Each run gets a fresh new VM, a copy-on-write copy of the specified base image for the project. Ambient passed in any input files in tar archives mapped to virtual block devices. The guest in the VM packs any output files into tar archives that it writes to other virtual block devices. The VM does not mount file systems from the host to reduce risk of security problems in host file system code.

In addition, one VM serial port is used for kernel console and another for build log. Ambient captures these.

The execution in the VM is controlled by a part of Ambient called the "executor", which the VM loads from the first virtual block device, together with the list of actions to execute.

An Ambient CI run is roughly like this:

execute pre-plan actions on the host
pack source tree, dependencies, artifacts, and cache into tar archives
execute plan actions in VM, with source and dependencies as read-only block devices, and artifacts and cache as read/write
unpack cache and artifacts
execute post-plan actions on the host

2 Design: Trusted VM (May, 2026)

(To be folded into the rest of document once the code changes are done.)

2.1 Problem statement

It is a problem that pre- and post-plan actions are executed directly on the host. It means the action implementations need to be vetted with extreme care, as the safety and security of the host depends on them not being problematic. One mistake and boom.

A problematic implementation might leak secrets, attack the host, attack other hosts, or otherwise abuse direct filesystem access on the host and unrestricted access to the network.

2.2 Suggested solution: trusted VM

An approach to mitigate and contain any problems is to execute the actions in dedicated virtual machines that have network access and only the secrets needed for the actions. These would be fresh and ephemeral, with constrained resources, the way the VM for the plan is. The only difference is network access. This approach is called the "trusted VM", where we trust the VM with network access.

This approach protects the host, but does not protect attacks over the network.

To mitigate attacks using the network, an additional layer of protection would constrain network use to only allowed use on the host side rather than inside the VM. This could be implemented using a firewall or a proxy. However, for the first stage of trusted VMs will not add the additional layer. Isolating pre- and post-plan actions in a VM is already an improvement, and implementing that will be enough work at once. A network protection layer can be added later as a separate step.

2.3 Implementation plan

Refactor src/run.rs to use a medium-level typo encapsulate running QEMU in a specific way: for pre-plan, for actual plan, and for post-plan. The type will handle setting up virtual block devices using provided tar archives, setting up the runnable plan and packing into a tar, etc. Change the code to use the new time to run the current single VM.
Change src/run.rs to use the new type for pre- and post-plan actions.

This might be enough, but we'll see.

2.4 Problems found while implementing:

Context::set_plan_env allows a pre-plan action to set an environment variable for executing the actual plan. This is currently used by the rustup action to set CARGO_HOME to /ci/deps/rustup.

The mechanism is generic, but the implementation doesn't work in the trusted VM architecture, as there is no way to securely communicate changes to affect the actual plan, from within the VM. Any mechanism the executor uses, malicious code could also use. While I can't think of a way to exploit that, I see no point in assuming it can't be.

I can either drop the generic mechanism, or find a way to do it within Ambient, while the action is executed in the VM.

I could add a method to RunnableAction that gets called for pre-plan actions to set environment vars for actual plan execution. This would happen within Ambient, regardless of what happens in the VM. I could add a method ActionImpl::set_plan_envs, with a no-op default implementation, and call that in RunnablePlan::execute if the action succeeds.
Post-plan actions need secrets. The current ones (rsync and dput) specifically need an SSH key. Possibly also a user certificate, depending on the SSH target host.

There are a number of ways in which this can be resolved. Ideally, the actions could use the secret (SSH key), but it would not be exposed to the VM. This might be achieved by running an SSH agent on the host, and proxy agent in the VM. However, this would take time to develop. Also, a proxy may not work for other kinds of secrets, such as HTTP API tokens.

Instead, I will first implement a simpler solution, where the trusted VM is given another virtual drive, with an SSH key, and optionally a user certificate. The virtual drive is extracted to /ci/secrets and SSH in the trusted VM is set up to use those.

The Ambient configuration will specify which, if any, SSH keys and certificates are put in the secrets drive. For simplicity, the drive will always be there, and so will /ci/secrets, but they may be empty.

I might later add support for generating a new key and a short-lived user certificate for each run. That will be dependent on SSH credential management for the SSH target, though. Thus, Ambient would only have a configuration option to run a command to create the temporary identity.

Ambient CI software architecture

Table of Contents