Skip to content
This repository was archived by the owner on May 6, 2020. It is now read-only.
This repository was archived by the owner on May 6, 2020. It is now read-only.

changing the p9fs options for i/o performance #1095

@zeigerpuppy

Description

@zeigerpuppy

Description of problem

tuning the p9fs block size and cache modes allows significant performance boost (10x) in KVM. Is there a process for setting these in cc-runtime?

Example

In KVM, I have found two options that significantly increase IO (see below).

This is on a server with a raidz2 ZFS array with 8x Micron 9100 1.2TB NVMe SSDs. There's plenty of head room, Raw IO on this array is around 3GB/sec/process plateauing at about 30GB/sec for 20 processes in iozone3.

With KVM guests using plan9 file system, it looks possible to get about 1GB/sec per CPU but we're getting only about 130MB/sec with clear containers and bind-mounted storage.

Host:

as the filesystem (ZFS) is consistent on the host by design, it's safe to use the passthough mode

<filesystem type='mount' accessmode='passthrough'>
   <source dir='/export/to/guest'/>
   <target dir='mount_tag'/>
</filesystem>

Client

in the mount options, adjusting the msize (packet payload in bytes) and disabling the client cache has a huge effect on I/O

msize=524288,cache=none

Actual result

With my standard KVM clients on the same host, I get about 1GB/sec/process (measured with iozone3). With the cc-runtime backed docker storage (bind mounts) I get only 130MB/sec.

Any help in setting these options in the cc-runtime options would be great as they are critical to good performance.


Settings output

[Runtime]
  Debug = false
  [Runtime.Version]
    Semver = "3.0.23"
    Commit = "64d2226"
    OCI = "1.0.1"
  [Runtime.Config]
    Path = "/usr/share/defaults/clear-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.7.1(2.7.1+git.d4a337fe91-11.cc), Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  Debug = false
  BlockDeviceDriver = "virtio-scsi"

[Image]
  Path = "/usr/share/clear-containers/cc-20640-agent-6f6e9e.img"

[Kernel]
  Path = "/usr/share/clear-containers/vmlinuz-4.14.22-86.container"
  Parameters = ""

[Proxy]
  Type = "ccProxy"
  Version = "Version: 3.0.23+git.3cebe5e"
  Path = "/usr/libexec/clear-containers/cc-proxy"
  Debug = false

[Shim]
  Type = "ccShim"
  Version = "shim version: 3.0.23 (commit: 205ecf7)"
  Path = "/usr/libexec/clear-containers/cc-shim"
  Debug = false

[Agent]
  Type = "hyperstart"
  Version = "<<unknown>>"

[Host]
  Kernel = "4.9.0-6-amd64"
  Architecture = "amd64"
  VMContainerCapable = true
  [Host.Distro]
    Name = "Debian GNU/Linux"
    Version = "9"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz"

Runtime config files

Runtime default config files

/usr/share/defaults/clear-containers/configuration.toml
/usr/share/defaults/clear-containers/configuration.toml

Runtime config file contents

Output of "cat "/etc/clear-containers/configuration.toml"":

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration.toml.in"
# XXX: Project:
# XXX:   Name: Intel® Clear Containers
# XXX:   Type: cc

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
image = "/usr/share/clear-containers/clear-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per POD/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1


# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's 
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons. 
# This flag prevents the block device from being passed to the hypervisor, 
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or 
# virtio-blk.
block_device_driver = "virtio-scsi"

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
# 
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[agent.cc]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
internetworking_model="bridged"

Output of "cat "/usr/share/defaults/clear-containers/configuration.toml"":

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration.toml.in"
# XXX: Project:
# XXX:   Name: Intel® Clear Containers
# XXX:   Type: cc

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
image = "/usr/share/clear-containers/clear-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per POD/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1


# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's 
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons. 
# This flag prevents the block device from being passed to the hypervisor, 
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or 
# virtio-blk.
block_device_driver = "virtio-scsi"

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
# 
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[agent.cc]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
internetworking_model="bridged"

Agent

version:

unknown

Logfiles

Runtime logs

/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
No recent runtime problems found in system journal.

Proxy logs

/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
No recent proxy problems found in system journal.

Shim logs

/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
No recent shim problems found in system journal.


Container manager details

Have docker

Docker

Output of "docker version":

Client:
 Version:	17.12.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	c97c6d6
 Built:	Wed Dec 27 20:11:19 2017
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:09:54 2017
  OS/Arch:	linux/amd64
  Experimental:	false

Output of "docker info":

Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 4
Server Version: 17.12.0-ce
Storage Driver: zfs
 Zpool: error while getting pool information strconv.ParseUint: parsing "": invalid syntax
 Zpool Health: not available
 Parent Dataset: zpool2/docker
 Space Used By Parent: 3466528128
 Space Available: 3851731992320
 Parent Quota: no
 Compression: lz4
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: cc-runtime runc
Default Runtime: cc-runtime
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: 64d2226 (expected: b2567b37d7b75eb4cf325b77297b140ea686ce8f)
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-6-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 376.6GiB
Name: <redacted>
ID: <redacted>
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 42
 Goroutines: 54
 System Time: 2018-04-13T19:42:44.065725616+10:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Output of "systemctl show docker":

/usr/bin/cc-collect-data.sh: line 167: systemctl: command not found

No kubectl


Packages

Have dpkg
Output of "dpkg -l|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"":

ii  cc-proxy                             3.0.23+git.3cebe5e-27                   amd64        
ii  cc-runtime                           3.0.23+git.64d2226-27                   amd64        
ii  cc-runtime-bin                       3.0.23+git.64d2226-27                   amd64        
ii  cc-runtime-config                    3.0.23+git.64d2226-27                   amd64        
ii  cc-shim                              3.0.23+git.205ecf7-27                   amd64        
ii  clear-containers-image               20640-48                                amd64        Clear containers image
ii  linux-container                      4.14.22-86                              amd64        linux kernel optimised for container-like workloads.
ii  qemu-lite                            2.7.1+git.d4a337fe91-11                 amd64        linux kernel optimised for container-like workloads.
ii  qemu-system-x86                      1:2.8+dfsg-6+deb9u3                     amd64        QEMU full system emulation binaries (x86)

Have rpm
Output of "rpm -qa|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"":


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions