Skip to content

feat: add Docker Compose deployment driver and image management#6

Draft
j4n wants to merge 20 commits intomainfrom
j4n/docker-support
Draft

feat: add Docker Compose deployment driver and image management#6
j4n wants to merge 20 commits intomainfrom
j4n/docker-support

Conversation

@j4n
Copy link
Copy Markdown

@j4n j4n commented Apr 22, 2026

This adds a docker deploy driver that builds a Docker image in the builder,
transfers it to an LXC relay container (with nesting enabled), and
starts it via docker compose. DNS zones are extracted from the
running container and loaded into PowerDNS.
Specifically:

  • Add docker build, docker list, and docker prune subcommands for
    managing the Docker image cache in the builder independently of any
    relay deployment.
  • Add docker logs, docker ps, docker shell, and docker pull for
    inspecting running deployments and pulling pre-built GHCR images.
  • Support --image PATH on docker deploy to load a pre-exported
    tarball, skipping the build step entirely. zstd compression is
    used when available, plain tar otherwise.
  • Update README with Docker usage examples, driver documentation, and
    image management commands.
  • Already used in CI: the chatmail/docker repo's workflow calls
    cmlxc docker deploy --source ghcr:TAG to integration-test each
    image build (pending trigger PR in chatmail/relay).

Try it:

# Initialize the environment (if not already done)
cmlxc init

# Deploy a relay via Docker
cmlxc docker deploy --source @main dk0

# Or pull a pre-built image from GHCR
cmlxc docker deploy --source ghcr:main dk0

# Or pre-build an image and deploy from tarball
cmlxc docker build --source @main --output ./chatmail.tar.zst
cmlxc docker deploy --image ./chatmail.tar.zst dk1

# Inspect running services
cmlxc docker ps dk0
cmlxc docker logs dk0 -f

# Manage the image cache
cmlxc docker list
cmlxc docker prune --dry-run
cmlxc docker prune --all

# Run tests against the Docker relay
cmlxc test-mini dk0
cmlxc test-cmdeploy dk0

@j4n j4n marked this pull request as draft April 22, 2026 15:30
j4n added 12 commits April 22, 2026 17:31
Containers with Docker or other networking can expose IPs on multiple
interfaces. _extract_ip() now accepts an optional subnet filter so
wait_ready() and list_managed() only pick addresses on incusbr0.
Move the initialization check (DNS container running + base image
present) from cli._check_init() into Incus.check_init() so that
drivers can call it without depending on the CLI module.
Allows drivers to pass additional Incus config keys (e.g.
security.nesting=true for Docker-in-LXC) when launching containers.
Threaded through Container and RelayContainer.
Fresh containers from cached images have stale package lists, causing
dnsutils install to fail with unmet dependencies.
Move the initenv.sh hook from CmdeployDriver.on_init_relay() into the
Driver base class as the default implementation -- both cmdeploy and
docker drivers used identical bodies.

Extract run_cmdeploy_pytest() as a standalone function so that any
driver sharing the cmdeploy test suite (currently CmdeployDriver and
DockerDriver) can call it without duplicating the env_exports / pytest
command construction.
initenv.sh already uses uv when available. This ensures it's
installed on the builder container so all drivers benefit.
Installs to /usr/local/bin so it's on PATH for non-interactive shells.
When the source ref is a full 40-char SHA (e.g. from CI dispatch),
the shallow git-main clone won't have it. Detect this case and
fetch just that commit with --depth 1 before checkout.
Query running Docker Compose service names from a container via
incus exec. Used by the docker driver's ps subcommand and SSH
config generation.
When no explicit -v flags are passed and RUNNER_DEBUG=1 is set
(GitHub Actions "Enable debug logging" rerun), auto-bump to -vvv.
- Register DockerDriver in DRIVER_BY_NAME
- test-cmdeploy: dispatch to driver class from container metadata,
  add --relay-ref option
- Fix _print_builder_repos to use driver REPO_NAME (avoids dupes)
- Add cmlxc_ref input to test feature branches
- Disable AppArmor for Docker-in-LXC systemd support
- Cache localchat-docker image (strip Docker images before export)
- Split cache into restore/save for better failure handling
- Per-service failure diagnostics (dovecot, postfix, failed units)
- install incus-base instead of full incus package
Add DockerDriver for deploying chatmail relays via Docker Compose
inside LXC containers (Docker-in-LXC with security.nesting).

Driver capabilities:
- Build images from relay source, tag by git SHA
- Transfer between builder and relay via piped docker save/load
- Pull pre-built images from GHCR (--source ghcr:TAG)
- Load local tarballs (--image, zstd if available)
- Healthcheck polling with --since log streaming at -vv
- SSH forwarding into Docker containers for test compatibility
- DNS zone extraction and PowerDNS loading
- Tiered image pruning (default/deep/all)
- security.privileged gated behind CI=true

CLI subcommands: deploy, build, list, pull, logs, ps, shell, prune
@j4n j4n force-pushed the j4n/docker-support branch from 59363ea to 20a246d Compare April 22, 2026 15:35
# Disable AppArmor restrictions so Docker-in-LXC containers
# can run systemd (needs cgroup notification socket access).
sudo systemctl stop apparmor || true
sudo apparmor_parser -R /etc/apparmor.d/* 2>/dev/null || true
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely sure if this is needed or if we can just run lxc with unbound profile

incus exec "$c" -- journalctl --no-pager -n 200 || true
# Dump Docker container logs if present
svc=chatmail
if incus exec "$c" -- docker ps -a --format '{{.Names}}' 2>/dev/null | grep -q "$svc"; then
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider making this a helper script in the dockerized build to facilitate log extraction for debugging.

# Publish the builder LXC container as a cached image (the Docker
# container inside gets recreated on compose up, so the LXC is clean).
# Only skip localchat-cmdeploy on failure -- it bakes deploy state
# directly into the LXC and would carry broken config into the next run.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify that we can cache the docker image container as well because its not containing state for this case - maybe clean up /opt completely

Comment thread src/cmlxc/cli.py

Standard workflow:
init -> deploy-cmdeploy/deploy-madmail -> test-cmdeploy/test-madmail/test-mini.
init -> deploy-cmdeploy/deploy-madmail/docker deploy -> test-*/test-mini.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does not make sense

Comment thread src/cmlxc/cli.py
for name in DRIVER_BY_NAME:
path = f"/root/{name}-git-main"
seen = set()
for name, drv_cls in DRIVER_BY_NAME.items():
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify change

Comment thread src/cmlxc/container.py
cfg = []
cfg += ("-c", f"{LABEL_KEY}=true")
cfg += ("-c", f"{LABEL_DOMAIN}={self.domain}")
if extra_config:
Copy link
Copy Markdown
Author

@j4n j4n Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purge, was added for --slow thats now gone; I did drop commit bc9f0 but it seems its still in here

Comment thread src/cmlxc/driver_base.py
out.print(f" Fetching {cls.REPO_NAME}-git-main from upstream ...")
bld_ct.bash(f"cd {tmp_dest} && git fetch origin")

# Install uv for faster venv/pip operations (used by initenv.sh)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this did speed things up a bunch, but with cached images shouldnt might not give us much and depends on a relay pr thats not there yet.

Comment thread src/cmlxc/driver_base.py
f"cd {repo_path} && "
f"git fetch --depth 1 origin {source.ref}"
)
elif source.ref != "main":
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to double check this catches all variants we need.

Comment thread src/cmlxc/driver_cmdeploy.py Outdated
CMDEPLOY = "cmdeploy"


def run_cmdeploy_pytest(driver, second_domain=None):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be called run test_cmdeploy

Comment thread src/cmlxc/incus.py
self._bridge_subnet = NotImplemented

@property
def bridge_subnet(self):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could clarify why we need it

Comment thread README.md
cmlxc docker ps dk0
cmlxc docker logs dk0
cmlxc docker logs dk0 -f

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe detail image cleanup as well

"config",
"set",
ct.name,
"security.nesting=true",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check - we're setting this in the ci again? necessary?

Comment thread src/cmlxc/driver_docker.py Outdated
"""Display docker disk usage summary from builder."""
raw = bld_ct.bash("docker system df", check=False)
if raw:
for line in raw.strip().splitlines():
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this printer wraparound is ripe for a wrapper

)


_PRUNE_COMMANDS = {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably should come first, check if the autoprune is mentioned in all the right places

)


def build_docker_cmd(args, out):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this image building has a substantial linecount with export etc, maybe we can take it out, estimate how much it would save. build then would require either local docker or a pushed image from the registry, which might just be enough.

Comment thread src/cmlxc/driver_docker.py Outdated
"service",
nargs="?",
default=DOCKER_COMPOSE_SERVICE,
metavar="SERVICE",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary metavars

Comment thread src/cmlxc/driver_docker.py Outdated
)
group.add_argument(
"--all",
dest="prune_all",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could reduce LOC by making this a subcommand maybe


self.ct.write_deploy_state(DOCKER, source=source)

def _load_local_image(self):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats nice to have but not really needed probably, along with export. either we deal with docker and have it, or we dont I suppose

("healthcheck state",
f"docker inspect {svc} --format '{{{{json .State.Health}}}}' 2>/dev/null"),
("dovecot journal",
f"docker exec {svc} journalctl -u dovecot --no-pager -n 30 2>&1"),
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section is duplicated I think

self.out.print(f" {line}")

def _patch_container_ini(self):
"""Apply test rate-limit overrides inside the Docker container.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we added this but cmdeploy already has this we should leverage this probably

)

def _load_dns(self, dns_ct):
"""Extract DNS zone from Docker container and load into PowerDNS."""
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be done in every driver, consolidate?

mode="600",
)

def _get_image_relay_sha(self):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I smell duplication

j4n added 8 commits April 28, 2026 08:56
The `git reset --hard origin/{ref}` is only useful for branch refs
(fast-forward to latest remote). For SHA refs it always fails silently
since there's no remote tracking branch. Only run it for branch refs.
so we can reuse it for docker
- Extract get_image_label_sha() helper, dedup docker inspect
  label extraction in pull_image and _get_image_relay_sha
- Extract _print_indented() helper for show_docker_df and
  _dump_docker_logs
- Remove redundant metavar="RELAY" and metavar="SERVICE"
- Simplify prune: positional level arg instead of --deep/--all flags
- Clarify ensure_docker() docstring (why nesting is set here)
@j4n j4n force-pushed the j4n/docker-support branch from 0e2b4d0 to d5f6b04 Compare April 28, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant