Skip to content

Commit d5bc4e3

Browse files
committed
Merge patch series "statmount: accept fd as a parameter"
Bhavik Sachdev <b.sachdev1904@gmail.com> says: We would like to add support for checkpoint/restoring file descriptors open on these "unmounted" mounts to CRIU (Checkpoint/Restore in Userspace) [1]. Currently, we have no way to get mount info for these "unmounted" mounts since they do appear in /proc/<pid>/mountinfo and statmount does not work on them, since they do not belong to any mount namespace. This patch helps us by providing a way to get mountinfo for these "unmounted" mounts by using a fd on the mount. Changes from v6 [2] to v7: * Add kselftests for STATMOUNT_BY_FD flag. * Instead of renaming mnt_id_req.mnt_ns_fd to mnt_id_req.fd introduce a union so struct mnt_id_req looks like this: struct mnt_id_req { __u32 size; union { __u32 mnt_ns_fd; __u32 mnt_fd; }; __u64 mnt_id; __u64 param; __u64 mnt_ns_id; }; * In case of STATMOUNT_BY_FD grab mnt_ns inside of do_statmount(), since we get mnt_ns from mnt, which should happen under namespace lock. * Remove the modifications made to grab_requested_mnt_ns, those were never needed. Changes from v5 [3] to v6: * Instead of returning "[unmounted]" as the mount point for "unmounted" mounts, we unset the STATMOUNT_MNT_POINT flag in statmount.mask. * Instead of returning 0 as the mnt_ns_id for "unmounted" mounts, we unset the STATMOUNT_MNT_NS_ID flag in statmount.mask. * Added comment in `do_statmount` clarifying that the caller sets s->mnt in case of STATMOUNT_BY_FD. * In `do_statmount` move the mnt_ns_id and mnt_ns_empty() check just before lookup_mnt_in_ns(). * We took another look at the capability checks for getting information for "unmounted" mounts using an fd and decided to remove them for the following reasons: - All fs related information is available via fstatfs() without any capability check. - Mount information is also available via /proc/pid/mountinfo (without any capability check). - Given that we have access to a fd on the mount which tells us that we had access to the mount at some point (or someone that had access gave us the fd). So, we should be able to access mount info. Changes from v4 [4] to v5: Check only for s->root.mnt to be NULL instead of checking for both s->root.mnt and s->root.dentry (I did not find a case where only one of them would be NULL). * Only allow system root (CAP_SYS_ADMIN in init_user_ns) to call statmount() on fd's on "unmounted" mounts. We (mostly Pavel) spent some time thinking about how our previous approach (of checking the opener's file credentials) caused problems. Please take a look at the linked pictures they describe everything more clearly. Case 1: A fd is on a normal mount (Link to Picture: [5]) Consider, a situation where we have two processes P1 and P2 and a file F1. F1 is opened on mount ns M1 by P1. P1 is nested inside user namespace U1 and U2. P2 is also in U1. P2 is also in a pid namespace and mount namespace separate from M1. P1 sends F1 to P2 (using a unix socket). But, P2 is unable to call statmount() on F1 because since it is a separate pid and mount namespace. This is good and expected. Case 2: A fd is on a "unmounted" mount (Link to Picture: [6]) Consider a similar situation as Case 1. But now F1 is on a mounted that has been "unmounted". Now, since we used openers credentials to check for permissions P2 ends up having the ability call statmount() and get mount info for this "unmounted" mount. Hence, It is better to restrict the ability to call statmount() on fds on "unmounted" mounts to system root only (There could also be other cases than the one described above). Changes from v3 [7] to v4: * Change the string returned when there is no mountpoint to be "[unmounted]" instead of "[detached]". * Remove the new DEFINE_FREE put_file and use the one already present in include/linux/file.h (fput) [8]. * Inside listmount consistently pass 0 in flags to copy_mnt_id_req and prepare_klistmount()->grab_requested_mnt_ns() and remove flags from the prepare_klistmount prototype. * If STATMOUNT_BY_FD is set, check for mnt_ns_id == 0 && mnt_id == 0. Changes from v2 [9] to v3: * Rename STATMOUNT_FD flag to STATMOUNT_BY_FD. * Fixed UAF bug caused by the reference to fd_mount being bound by scope of CLASS(fd_raw, f)(kreq.fd) by using fget_raw instead. * Reused @Spare parameter in mnt_id_req instead of adding new fields to the struct. Changes from v1 [10] to v2: v1 of this patchset, took a different approach and introduced a new umount_mnt_ns, to which "unmounted" mounts would be moved to (instead of their namespace being NULL) thus allowing them to be still available via statmount. Introducing umount_mnt_ns complicated namespace locking and modified performance sensitive code [11] and it was agreed upon that fd-based statmount would be better. This code is also available on github [12]. [1]: checkpoint-restore/criu#2754 [2]: https://lore.kernel.org/all/20251118084836.2114503-1-b.sachdev1904@gmail.com/ [3]: https://lore.kernel.org/criu/20251109053921.1320977-2-b.sachdev1904@gmail.com/T/#u [4]: https://lore.kernel.org/all/20251029052037.506273-2-b.sachdev1904@gmail.com/ [5]: https://github.com/bsach64/linux/blob/statmount-fd-v5/fd_on_normal_mount.png [6]: https://github.com/bsach64/linux/blob/statmount-fd-v5/file_on_unmounted_mount.png [7]: https://lore.kernel.org/all/20251024181443.786363-1-b.sachdev1904@gmail.com/ [8]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/file.h#n97 [9]: https://lore.kernel.org/linux-fsdevel/20251011124753.1820802-1-b.sachdev1904@gmail.com/ [10]: https://lore.kernel.org/linux-fsdevel/20251002125422.203598-1-b.sachdev1904@gmail.com/ [11]: https://lore.kernel.org/linux-fsdevel/7e4d9eb5-6dde-4c59-8ee3-358233f082d0@virtuozzo.com/ [12]: https://github.com/bsach64/linux/tree/statmount-fd-v7 * patches from https://patch.msgid.link/20251129091455.757724-1-b.sachdev1904@gmail.com: selftests: statmount: tests for STATMOUNT_BY_FD statmount: accept fd as a parameter statmount: permission check should return EPERM Link: https://patch.msgid.link/20251129091455.757724-1-b.sachdev1904@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2 parents 8f0b4cc + 0c82fdb commit d5bc4e3

File tree

5 files changed

+430
-59
lines changed

5 files changed

+430
-59
lines changed

fs/namespace.c

Lines changed: 67 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -5547,31 +5547,49 @@ static int grab_requested_root(struct mnt_namespace *ns, struct path *root)
55475547

55485548
/* locks: namespace_shared */
55495549
static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
5550-
struct mnt_namespace *ns)
5550+
struct file *mnt_file, struct mnt_namespace *ns)
55515551
{
5552-
struct mount *m;
55535552
int err;
55545553

5555-
/* Has the namespace already been emptied? */
5556-
if (mnt_ns_id && mnt_ns_empty(ns))
5557-
return -ENOENT;
5554+
if (mnt_file) {
5555+
WARN_ON_ONCE(ns != NULL);
55585556

5559-
s->mnt = lookup_mnt_in_ns(mnt_id, ns);
5560-
if (!s->mnt)
5561-
return -ENOENT;
5557+
s->mnt = mnt_file->f_path.mnt;
5558+
ns = real_mount(s->mnt)->mnt_ns;
5559+
if (!ns)
5560+
/*
5561+
* We can't set mount point and mnt_ns_id since we don't have a
5562+
* ns for the mount. This can happen if the mount is unmounted
5563+
* with MNT_DETACH.
5564+
*/
5565+
s->mask &= ~(STATMOUNT_MNT_POINT | STATMOUNT_MNT_NS_ID);
5566+
} else {
5567+
/* Has the namespace already been emptied? */
5568+
if (mnt_ns_id && mnt_ns_empty(ns))
5569+
return -ENOENT;
55625570

5563-
err = grab_requested_root(ns, &s->root);
5564-
if (err)
5565-
return err;
5571+
s->mnt = lookup_mnt_in_ns(mnt_id, ns);
5572+
if (!s->mnt)
5573+
return -ENOENT;
5574+
}
55665575

5567-
/*
5568-
* Don't trigger audit denials. We just want to determine what
5569-
* mounts to show users.
5570-
*/
5571-
m = real_mount(s->mnt);
5572-
if (!is_path_reachable(m, m->mnt.mnt_root, &s->root) &&
5573-
!ns_capable_noaudit(ns->user_ns, CAP_SYS_ADMIN))
5574-
return -EPERM;
5576+
if (ns) {
5577+
err = grab_requested_root(ns, &s->root);
5578+
if (err)
5579+
return err;
5580+
5581+
if (!mnt_file) {
5582+
struct mount *m;
5583+
/*
5584+
* Don't trigger audit denials. We just want to determine what
5585+
* mounts to show users.
5586+
*/
5587+
m = real_mount(s->mnt);
5588+
if (!is_path_reachable(m, m->mnt.mnt_root, &s->root) &&
5589+
!ns_capable_noaudit(ns->user_ns, CAP_SYS_ADMIN))
5590+
return -EPERM;
5591+
}
5592+
}
55755593

55765594
err = security_sb_statfs(s->mnt->mnt_root);
55775595
if (err)
@@ -5693,7 +5711,7 @@ static int prepare_kstatmount(struct kstatmount *ks, struct mnt_id_req *kreq,
56935711
}
56945712

56955713
static int copy_mnt_id_req(const struct mnt_id_req __user *req,
5696-
struct mnt_id_req *kreq)
5714+
struct mnt_id_req *kreq, unsigned int flags)
56975715
{
56985716
int ret;
56995717
size_t usize;
@@ -5711,11 +5729,17 @@ static int copy_mnt_id_req(const struct mnt_id_req __user *req,
57115729
ret = copy_struct_from_user(kreq, sizeof(*kreq), req, usize);
57125730
if (ret)
57135731
return ret;
5714-
if (kreq->mnt_ns_fd != 0 && kreq->mnt_ns_id)
5715-
return -EINVAL;
5716-
/* The first valid unique mount id is MNT_UNIQUE_ID_OFFSET + 1. */
5717-
if (kreq->mnt_id <= MNT_UNIQUE_ID_OFFSET)
5718-
return -EINVAL;
5732+
5733+
if (flags & STATMOUNT_BY_FD) {
5734+
if (kreq->mnt_id || kreq->mnt_ns_id)
5735+
return -EINVAL;
5736+
} else {
5737+
if (kreq->mnt_ns_fd != 0 && kreq->mnt_ns_id)
5738+
return -EINVAL;
5739+
/* The first valid unique mount id is MNT_UNIQUE_ID_OFFSET + 1. */
5740+
if (kreq->mnt_id <= MNT_UNIQUE_ID_OFFSET)
5741+
return -EINVAL;
5742+
}
57195743
return 0;
57205744
}
57215745

@@ -5762,25 +5786,33 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
57625786
{
57635787
struct mnt_namespace *ns __free(mnt_ns_release) = NULL;
57645788
struct kstatmount *ks __free(kfree) = NULL;
5789+
struct file *mnt_file __free(fput) = NULL;
57655790
struct mnt_id_req kreq;
57665791
/* We currently support retrieval of 3 strings. */
57675792
size_t seq_size = 3 * PATH_MAX;
57685793
int ret;
57695794

5770-
if (flags)
5795+
if (flags & ~STATMOUNT_BY_FD)
57715796
return -EINVAL;
57725797

5773-
ret = copy_mnt_id_req(req, &kreq);
5798+
ret = copy_mnt_id_req(req, &kreq, flags);
57745799
if (ret)
57755800
return ret;
57765801

5777-
ns = grab_requested_mnt_ns(&kreq);
5778-
if (IS_ERR(ns))
5779-
return PTR_ERR(ns);
5802+
if (flags & STATMOUNT_BY_FD) {
5803+
mnt_file = fget_raw(kreq.mnt_fd);
5804+
if (!mnt_file)
5805+
return -EBADF;
5806+
/* do_statmount sets ns in case of STATMOUNT_BY_FD */
5807+
} else {
5808+
ns = grab_requested_mnt_ns(&kreq);
5809+
if (IS_ERR(ns))
5810+
return PTR_ERR(ns);
57805811

5781-
if (kreq.mnt_ns_id && (ns != current->nsproxy->mnt_ns) &&
5782-
!ns_capable_noaudit(ns->user_ns, CAP_SYS_ADMIN))
5783-
return -ENOENT;
5812+
if (kreq.mnt_ns_id && (ns != current->nsproxy->mnt_ns) &&
5813+
!ns_capable_noaudit(ns->user_ns, CAP_SYS_ADMIN))
5814+
return -EPERM;
5815+
}
57845816

57855817
ks = kmalloc(sizeof(*ks), GFP_KERNEL_ACCOUNT);
57865818
if (!ks)
@@ -5792,7 +5824,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
57925824
return ret;
57935825

57945826
scoped_guard(namespace_shared)
5795-
ret = do_statmount(ks, kreq.mnt_id, kreq.mnt_ns_id, ns);
5827+
ret = do_statmount(ks, kreq.mnt_id, kreq.mnt_ns_id, mnt_file, ns);
57965828

57975829
if (!ret)
57985830
ret = copy_statmount_to_user(ks);
@@ -5932,7 +5964,7 @@ SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req,
59325964
if (!access_ok(mnt_ids, nr_mnt_ids * sizeof(*mnt_ids)))
59335965
return -EFAULT;
59345966

5935-
ret = copy_mnt_id_req(req, &kreq);
5967+
ret = copy_mnt_id_req(req, &kreq, 0);
59365968
if (ret)
59375969
return ret;
59385970

include/uapi/linux/mount.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,10 @@ struct statmount {
197197
*/
198198
struct mnt_id_req {
199199
__u32 size;
200-
__u32 mnt_ns_fd;
200+
union {
201+
__u32 mnt_ns_fd;
202+
__u32 mnt_fd;
203+
};
201204
__u64 mnt_id;
202205
__u64 param;
203206
__u64 mnt_ns_id;
@@ -232,4 +235,9 @@ struct mnt_id_req {
232235
#define LSMT_ROOT 0xffffffffffffffff /* root mount */
233236
#define LISTMOUNT_REVERSE (1 << 0) /* List later mounts first */
234237

238+
/*
239+
* @flag bits for statmount(2)
240+
*/
241+
#define STATMOUNT_BY_FD 0x00000001U /* want mountinfo for given fd */
242+
235243
#endif /* _UAPI_LINUX_MOUNT_H */

tools/testing/selftests/filesystems/statmount/statmount.h

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,19 +43,24 @@
4343
#endif
4444
#endif
4545

46-
static inline int statmount(uint64_t mnt_id, uint64_t mnt_ns_id, uint64_t mask,
47-
struct statmount *buf, size_t bufsize,
46+
static inline int statmount(uint64_t mnt_id, uint64_t mnt_ns_id, uint32_t fd,
47+
uint64_t mask, struct statmount *buf, size_t bufsize,
4848
unsigned int flags)
4949
{
5050
struct mnt_id_req req = {
5151
.size = MNT_ID_REQ_SIZE_VER0,
52-
.mnt_id = mnt_id,
5352
.param = mask,
5453
};
5554

56-
if (mnt_ns_id) {
55+
if (flags & STATMOUNT_BY_FD) {
5756
req.size = MNT_ID_REQ_SIZE_VER1;
58-
req.mnt_ns_id = mnt_ns_id;
57+
req.mnt_fd = fd;
58+
} else {
59+
req.mnt_id = mnt_id;
60+
if (mnt_ns_id) {
61+
req.size = MNT_ID_REQ_SIZE_VER1;
62+
req.mnt_ns_id = mnt_ns_id;
63+
}
5964
}
6065

6166
return syscall(__NR_statmount, &req, buf, bufsize, flags);

0 commit comments

Comments
 (0)