Skip to content

config: add MQUEUE, BRIDGE, NETFILTER support#128

Open
dustymabe wants to merge 3 commits into
containers:mainfrom
dustymabe:dusty-kernel-configs
Open

config: add MQUEUE, BRIDGE, NETFILTER support#128
dustymabe wants to merge 3 commits into
containers:mainfrom
dustymabe:dusty-kernel-configs

Conversation

@dustymabe
Copy link
Copy Markdown
Contributor

@dustymabe dustymabe commented May 26, 2026

See individual commit messages.

Essentially what I'm trying to do is run podman inside my krun VM and have it be completely isolated from the host, but also function well enough without special arguments to act generically like a container runtime. i.e. I want to be able to navigate to a project's source code and type make whatever and if that project leverages containers everything just works.

An example of this would be make ci-operator-config from this repo without having to patch the Makefile like this:

diff --git a/Makefile b/Makefile
index 2e779fbd235..fe6999cced6 100644
--- a/Makefile
+++ b/Makefile
@@ -3,7 +3,7 @@ SHELL=/usr/bin/env bash -o errexit
 .PHONY: help check check-boskos check-core check-services check-validate-main-promotion dry-core core dry-services services all update release-controllers checkconfig jobs ci-operator-config registry-metadata boskos-config prow-config validate-step-registry new-repo branch-cut prow-config multi-arch-gen 
 
 export CONTAINER_ENGINE ?= podman
-export CONTAINER_ENGINE_OPTS ?= --platform linux/amd64
+export CONTAINER_ENGINE_OPTS ?= --platform linux/amd64 --net=host --mount type=tmpfs,destination=/dev/mqueue
 export SKIP_PULL ?= false
 
 VOLUME_MOUNT_FLAGS = :z

Ultimately my goal is to sandbox AI agents so I'm comfortable allowing them to do more, but not worrying about my host system.

I understand especially the last commit adding BRIDGE/NETFILTER may cause the size of the kernel to increase and may not be desired since it enables many options. Is this generally a problem? Is there some sort of balance between "these config options enable an important use case" and the size increase?

dustymabe added 3 commits May 26, 2026 14:54
These config files were originally generated against older kernel
versions (6.6.59 for x86_64/aarch64/sev/windows, 6.12.20 for
tdx/riscv64) and had not been refreshed after the 6.12.87 rebase.

Run `make olddefconfig` against the 6.12.87 kernel sources to pick
up new defaults and resolve any missing or changed Kconfig symbols.

Assisted-by: <anthropic/claude-opus-4.6>
Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
The value is set on aarch64 and without it we hit a common error
trying to run containers inside a libkrun VM. Start up libkrun:

```
$ podman run --net=host --rm --log-level=debug quay.io/fedora/fedora-minimal:44 echo hello
...
time="2026-05-18T19:55:54Z" level=debug msg="ExitCode msg: \"crun: mount `mqueue` to `dev/mqueue`:
no such device: oci runtime error\"" Error: OCI runtime error: crun: mount `mqueue` to `dev/mqueue`: No such device
```

Note this is essentially a revert of 62444be.

Assisted-by: <anthropic/claude-opus-4.6>
Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
If you want to start a podman container inside the krun VM and not
use `--net=host` (i.e. use netavark instead) then you need BRIDGE
support [1].

Additionally, enable CONFIG_NETFILTER and the full nftables stack
(NF_TABLES, NF_CONNTRACK, NF_NAT, NFT_MASQ, etc.) so that
podman/netavark can use nft without getting:

  src/mnl.c:66: Unable to initialize Netlink socket: Protocol not supported

Enable these on all three architectures: x86_64, aarch64, riscv64.

[1] podman-container-tools/podman#25201 (comment)

Assisted-by: <anthropic/claude-opus-4.6>
Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
@slp
Copy link
Copy Markdown
Collaborator

slp commented May 27, 2026

I understand especially the last commit adding BRIDGE/NETFILTER may cause the size of the kernel to increase and may not be desired since it enables many options. Is this generally a problem? Is there some sort of balance between "these config options enable an important use case" and the size increase?

There's never been a well-defined rule. If there's a use case, the change doesn't impact boot time and the size increase isn't too large, it can go in.

In this case, on aarch64 I see the size increases by just 64k, and there isn't a reason for this change to affect boot time. So LGTM, thanks!

@slp
Copy link
Copy Markdown
Collaborator

slp commented May 27, 2026

I've just noticed the first commit, beyond aligning the config with 6.12.87, actually changes the current configuration, at least on aarch64 (for instance, disabling KVM). Please drop the first commit and just add the new configuration options (Makefile does an "olddefconfig" anyway).

@dustymabe
Copy link
Copy Markdown
Contributor Author

dustymabe commented May 27, 2026

I've just noticed the first commit, beyond aligning the config with 6.12.87, actually changes the current configuration, at least on aarch64 (for instance, disabling KVM).

oops. Yes, I see that now.

Please drop the first commit and just add the new configuration options (Makefile does an "olddefconfig" anyway).

Do you know an easy way for me to translate "I want bridge and netfilter/nftables support" into the right config options?

with MQUEUE and KVM it was pretty easy because there were only a few options but for netfilter there are a ton of options so going through some sort of "wizard" would help, but I think when you do that you end up with all the defaults from 6.12.87 getting updated too (i.e. the first commit of this PR, which is why I broke it out into a separate commit).

@slp
Copy link
Copy Markdown
Collaborator

slp commented May 27, 2026

with MQUEUE and KVM it was pretty easy because there were only a few options but for netfilter there are a ton of options so going through some sort of "wizard" would help, but I think when you do that you end up with all the defaults from 6.12.87 getting updated too (i.e. the first commit of this PR, which is why I broke it out into a separate commit).

Let me update the configs manually, and then you can rebase this PR on it.

@DaniD3v
Copy link
Copy Markdown

DaniD3v commented May 28, 2026

Hey @dustymabe

I've coincidentally been working on the exact same thing over the last 2 days.
Maybe you wanna take a look at my mcp server / work on it together?
My Sandboxed MCP Server

I also modified the kernel modules to run docker (without any warnings so there's a bit more bloat). It would be nice to add these, too.
My Commit

(Update:)
I tested the size diff and apperently it is not half bad.

20868	libkrunfw.so.5.4.0-current
20996	libkrunfw.so.5.4.0-my-fork

This PR is also very useful as this is a basically perfect solution for the dind problem.
Especially considering that the krun container runtime seems to be installed per default with crun.

@mtjhrc
Copy link
Copy Markdown
Contributor

mtjhrc commented May 29, 2026

FYI, the MQUEUE topic came up before in containers/libkrun#653. This was adressed in podman-container-tools/podman#28639 (inner podman no longer requires MQUEUE).
Just linking for context — we can still enable it if required for other stuff (e.g. docker, or older podman versions).

@dustymabe
Copy link
Copy Markdown
Contributor Author

FYI, the MQUEUE topic came up before in containers/libkrun#653. This was adressed in containers/podman#28639 (inner podman no longer requires MQUEUE).

Oh nice. That's not in a podman release just yet but will be nice once it is.

Just linking for context — we can still enable it if required for other stuff (e.g. docker, or older podman versions).

yeah I'm not sure what we want the strategy to be - will lean on @slp for that.

If we drop that part from this PR I should add a commit to remove MQUEUE from the non-x86_64 configs because 62444be only removed it from x86_64.

@slp I think at this point I'm waiting on #128 (comment) - let me know if there's any action I need to take

@dustymabe
Copy link
Copy Markdown
Contributor Author

I've coincidentally been working on the exact same thing over the last 2 days.
Maybe you wanna take a look at my mcp server / work on it together?
My Sandboxed MCP Server

Interesting @DaniD3v - we're definitely working in the same area. I think it's slightly different, though. For me I launch the AI tool inside the krun VM and then it does everything in there.

So I think the difference is that for me opencode itself (and the entire VM it's running in) is the sandbox and for you opencode is run in some environment (host?) and then it runs commands via the MCP tool you wrote inside a sandbox. Is that a correct understanding?

@DaniD3v
Copy link
Copy Markdown

DaniD3v commented May 29, 2026

So I think the difference is that for me opencode itself (and the entire VM it's running in) is the sandbox and for you opencode is run in some environment (host?) and then it runs commands via the MCP tool you wrote inside a sandbox. Is that a correct understanding?

yes, exactly

Interesting @DaniD3v - we're definitely working in the same area. I think it's slightly different, though. For me I launch the AI tool inside the krun VM and then it does everything in there.

I see. That's also interesting.
This is kind of offtopic for libkrunfw but which advantages do you see in that approach?

The MCP server approach has a few (opinionated) advantages:
the llm can choose the image -> it never has to install something like rust but just uses the appropriate image
I can change directory in opencode/whichever tool and it keeps working.
The distinction between write/read commands -> I can set read commands to always-allow.

I guess you could combine them tho and then you'd still have advantages 1 and 3

@slp
Copy link
Copy Markdown
Collaborator

slp commented Jun 1, 2026

@dustymabe Please consider rebasing on top of #129. I've sync'ed the config files on top of a 6.12.91 kernel.

@dustymabe
Copy link
Copy Markdown
Contributor Author

This is kind of offtopic for libkrunfw but which advantages do you see in that approach?

I'm still building my trust in AI in general. Right now my workflow shares only specific directories into the environment where my tool (opencode) runs. The container (really krunvm via podman --runtime=krun) is ephemeral except for a few things:

  1. The current working directory (usually a git repo)
  2. A containers storage volume (i.e. --privileged --volume opencode-containers-storage:/var/lib/containers) so "container things" that happened inside the krun VM can persist.
  3. a unique PROJECT_OPENCODE_STATEDIR for each directory on my host via -v ${PROJECT_OPENCODE_STATEDIR}:/root/.local/:z. basically the opencode state for each "starting" directory on my host has it's own statedir so I can access older sessions I ran in that directory in the past.

@dustymabe
Copy link
Copy Markdown
Contributor Author

@dustymabe Please consider rebasing on top of #129. I've sync'ed the config files on top of a 6.12.91 kernel.

@slp will do!

I'm no kernel expert here. Do you know the best way to enable the default config values for nftables/netfilter and bridge in the configs? In my initial run I just asked AI to do it, but I'd like to do it a more proper way if there is one you recommend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants