Skip to content

ZFS space usage is double-counted #3832

@leapwill

Description

@leapwill

Starting in 0.55.0, the container_fs_usage_bytes appeared wrong, much higher than it should be, for ZFS on my systems. Digging in, container_fs_inodes_* also stopped working at the same time. Behavior is the same in 0.56.2, and it was working as expected in multiple versions starting with 0.54.1 and moving backwards.

The ZFS stats, now in zfs/stats.go, double-count the space used by the current dataset. Line:

total := dataset.Used + dataset.Avail + dataset.Usedbydataset

(where total becomes FsStats.Capacity, and Avail is the only other property returned)

From man zfsprops:

usedby* The usedby* properties decompose the used properties into the various reasons that space is used. Specifically, used = usedbychildren + usedbydataset + usedbyrefreservation + usedbysnapshots. These properties are only available for datasets created on zpool "version 13" pools.

When taking used and a usedby, usage is double counted. usedbydataset is the data in the current dataset. So, if we have layout and sizes:

home/ - 10G
└ alice/ - 15G

Then usage reported by VFS and ZFS would be:

VFS ZFS
home/ 10G 35G
home/alice/ 15G 30G

Part of me feels there must be something I'm missing, as the line in question hasn't changed since e0fef76 11 years ago.

How I found it

#3794 did a little more than its name: it did move filsystem stats from fs.go into FS-specific plugins, but it also made some small changes, such as rewriting the /dev/zfs check for ZFS vs VFS stats. Old line checks if exist then use ZFS, while new line checks if not exist then use VFS. It sounds right, but my system was using the VFS code for my ZFS filesystems on 0.54.1 and earlier, but started using the ZFS code in 0.55.0. This seems like a bug that was unreported but (silently) fixed. (Side note: this was tricky to figure out because there is no logging on the successful code path, only on failures. I resorted to strace to confirm this behavior. It is also not dependent on explicitly mounting /dev/zfs, though I am running privileged.)

Possible solutions

  1. Remove Usedbydataset from the sum, leaving Capacity as only used + available. This counts all child filesystems' space, meaning "usage" in a tree is counted multiple times.
  2. Remove Used, leaving current dataset + available. This leaves children's space to be accounted for in their own metrics, but ZFS' available includes all children. In the above example, on a 50G drive/quota, the available space would be reported as 25G for both datasets (so available is counted multiple times).
  3. An oddball, but create a sum of usedby* that excludes reservation, so that only space consumed by actual data counts as used. This would leave reservations entirely unaccounted for.
  4. Revisit Why not use statfs for ZFS? #1884 and just (intentionally) use VFS for ZFS, since it seems to be working now and collects additional stats (inodes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions