Message ID | 20240813-work-f_owner-v2-1-4e9343a79f9f@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] file: reclaim 24 bytes from f_owner | expand |
On Tue, Aug 13, 2024 at 2:31 PM Christian Brauner <brauner@kernel.org> wrote: > -int send_sigurg(struct fown_struct *fown) > +int send_sigurg(struct file *file) > { > + struct fown_struct *fown; > struct task_struct *p; > enum pid_type type; > struct pid *pid; > unsigned long flags; > int ret = 0; > > + fown = file_f_owner(file); > + if (fown) > + return 0; > + if (!fown) ? > read_lock_irqsave(&fown->lock, flags); > > type = fown->pid_type; > @@ -1027,13 +1109,16 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band)
On Tue, Aug 13, 2024 at 02:42:11PM GMT, Mateusz Guzik wrote: > On Tue, Aug 13, 2024 at 2:31 PM Christian Brauner <brauner@kernel.org> wrote: > > -int send_sigurg(struct fown_struct *fown) > > +int send_sigurg(struct file *file) > > { > > + struct fown_struct *fown; > > struct task_struct *p; > > enum pid_type type; > > struct pid *pid; > > unsigned long flags; > > int ret = 0; > > > > + fown = file_f_owner(file); > > + if (fown) > > + return 0; > > + > > if (!fown) ? Bah, I had fixed that in-tree. Thanks!
On Tue, 2024-08-13 at 14:30 +0200, Christian Brauner wrote: > We do embedd struct fown_struct into struct file letting it take up 32 > bytes in total. We could tweak struct fown_struct to be more compact but > really it shouldn't even be embedded in struct file in the first place. > > Instead, actual users of struct fown_struct should allocate the struct > on demand. This frees up 24 bytes in struct file. > > That will have some potentially user-visible changes for the ownership > fcntl()s. Some of them can now fail due to allocation failures. > Practically, that probably will almost never happen as the allocations > are small and they only happen once per file. > > The fown_struct is used during kill_fasync() which is used by e.g., > pipes to generate a SIGIO signal. Sending of such signals is conditional > on userspace having set an owner for the file using one of the F_OWNER > fcntl()s. Such users will be unaffected if struct fown_struct is > allocated during the fcntl() call. > > There are a few subsystems that call __f_setown() expecting > file->f_owner to be allocated: > > (1) tun devices > file->f_op->fasync::tun_chr_fasync() > -> __f_setown() > > There are no callers of tun_chr_fasync(). > > (2) tty devices > > file->f_op->fasync::tty_fasync() > -> __tty_fasync() > -> __f_setown() > > tty_fasync() has no additional callers but __tty_fasync() has. Note > that __tty_fasync() only calls __f_setown() if the @on argument is > true. It's called from: > > file->f_op->release::tty_release() > -> tty_release() > -> __tty_fasync() > -> __f_setown() > > tty_release() calls __tty_fasync() with @on false > => __f_setown() is never called from tty_release(). > => All callers of tty_release() are safe as well. > > file->f_op->release::tty_open() > -> tty_release() > -> __tty_fasync() > -> __f_setown() > > __tty_hangup() calls __tty_fasync() with @on false > => __f_setown() is never called from tty_release(). > => All callers of __tty_hangup() are safe as well. > > From the callchains it's obvious that (1) and (2) end up getting called > via file->f_op->fasync(). That can happen either through the F_SETFL > fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If > FASYNC is requested and the file isn't already FASYNC then > file->f_op->fasync() is called with @on true which ends up causing both > (1) and (2) to call __f_setown(). > > (1) and (2) are the only subsystems that call __f_setown() from the > file->f_op->fasync() handler. So both (1) and (2) have been updated to > allocate a struct fown_struct prior to calling fasync_helper() to > register with the fasync infrastructure. That's safe as they both call > fasync_helper() which also does allocations if @on is true. > > The other interesting case are file leases: > > (3) file leases > lease_manager_ops->lm_setup::lease_setup() > -> __f_setown() > > Which in turn is called from: > > generic_add_lease() > -> lease_manager_ops->lm_setup::lease_setup() > -> __f_setown() > > So here again we can simply make generic_add_lease() allocate struct > fown_struct prior to the lease_manager_ops->lm_setup::lease_setup() > which happens under a spinlock. > > With that the two remaining subsystems that call __f_setown() are: > > (4) dnotify > (5) sockets > > Both have their own custom ioctls to set struct fown_struct and both > have been converted to allocate a struct fown_struct on demand from > their respective ioctls. > > Interactions with O_PATH are fine as well e.g., when opening a /dev/tty > as O_PATH then no file->f_op->open() happens thus no file->f_owner is > allocated. That's fine as no file operation will be set for those and > the device has never been opened. fcntl()s called on such things will > just allocate a ->f_owner on demand. Although I have zero idea why'd you > care about f_owner on an O_PATH fd. > > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- > * There's no more cleanup macros used in this version as we can solve > that all much simpler. > * Survives LTP which tests a bunch of that stuff. > * Survives perf's watermak signal tests which make use of FASYNC. > > Goes into -next unless I hear objections. > --- > > --- > drivers/net/tun.c | 6 ++ > drivers/tty/tty_io.c | 6 ++ > fs/fcntl.c | 153 ++++++++++++++++++++++++++++++++++---------- > fs/file_table.c | 6 +- > fs/locks.c | 6 +- > fs/notify/dnotify/dnotify.c | 6 +- > include/linux/fs.h | 11 +++- > net/core/sock.c | 2 +- > security/selinux/hooks.c | 2 +- > security/smack/smack_lsm.c | 2 +- > 10 files changed, 157 insertions(+), 43 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 1d06c560c5e6..6fe5e8f7017c 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -3451,6 +3451,12 @@ static int tun_chr_fasync(int fd, struct file *file, int on) > struct tun_file *tfile = file->private_data; > int ret; > > + if (on) { > + ret = file_f_owner_allocate(file); > + if (ret) > + goto out; > + } > + > if ((ret = fasync_helper(fd, file, on, &tfile->fasync)) < 0) > goto out; > > diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c > index 407b0d87b7c1..7ae0c8934f42 100644 > --- a/drivers/tty/tty_io.c > +++ b/drivers/tty/tty_io.c > @@ -2225,6 +2225,12 @@ static int __tty_fasync(int fd, struct file *filp, int on) > if (tty_paranoia_check(tty, file_inode(filp), "tty_fasync")) > goto out; > > + if (on) { > + retval = file_f_owner_allocate(filp); > + if (retval) > + goto out; > + } > + > retval = fasync_helper(fd, filp, on, &tty->fasync); > if (retval <= 0) > goto out; > diff --git a/fs/fcntl.c b/fs/fcntl.c > index 300e5d9ad913..b002730fdcd1 100644 > --- a/fs/fcntl.c > +++ b/fs/fcntl.c > @@ -87,22 +87,53 @@ static int setfl(int fd, struct file * filp, unsigned int arg) > return error; > } > > +/* > + * Allocate an file->f_owner struct if it doesn't exist, handling racing > + * allocations correctly. > + */ > +int file_f_owner_allocate(struct file *file) > +{ > + struct fown_struct *f_owner; > + > + f_owner = file_f_owner(file); > + if (f_owner) > + return 0; > + > + f_owner = kzalloc(sizeof(struct fown_struct), GFP_KERNEL); > + if (!f_owner) > + return -ENOMEM; > + > + rwlock_init(&f_owner->lock); > + f_owner->file = file; > + /* If someone else raced us, drop our allocation. */ > + if (unlikely(cmpxchg(&file->f_owner, NULL, f_owner))) nit: try_cmpxchg generates better asm and should be fine here. > + kfree(f_owner); > + return 0; > +} > +EXPORT_SYMBOL(file_f_owner_allocate); > + > static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, > int force) > { > - write_lock_irq(&filp->f_owner.lock); > - if (force || !filp->f_owner.pid) { > - put_pid(filp->f_owner.pid); > - filp->f_owner.pid = get_pid(pid); > - filp->f_owner.pid_type = type; > + struct fown_struct *f_owner; > + > + f_owner = file_f_owner(filp); > + if (WARN_ON_ONCE(!f_owner)) > + return; > + > + write_lock_irq(&f_owner->lock); > + if (force || !f_owner->pid) { > + put_pid(f_owner->pid); > + f_owner->pid = get_pid(pid); > + f_owner->pid_type = type; > > if (pid) { > const struct cred *cred = current_cred(); > - filp->f_owner.uid = cred->uid; > - filp->f_owner.euid = cred->euid; > + f_owner->uid = cred->uid; > + f_owner->euid = cred->euid; > } > } > - write_unlock_irq(&filp->f_owner.lock); > + write_unlock_irq(&f_owner->lock); > } > > void __f_setown(struct file *filp, struct pid *pid, enum pid_type type, > @@ -119,6 +150,8 @@ int f_setown(struct file *filp, int who, int force) > struct pid *pid = NULL; > int ret = 0; > > + might_sleep(); > + > type = PIDTYPE_TGID; > if (who < 0) { > /* avoid overflow below */ > @@ -129,6 +162,10 @@ int f_setown(struct file *filp, int who, int force) > who = -who; > } > > + ret = file_f_owner_allocate(filp); > + if (ret) > + return ret; > + > rcu_read_lock(); > if (who) { > pid = find_vpid(who); > @@ -152,16 +189,21 @@ void f_delown(struct file *filp) > pid_t f_getown(struct file *filp) > { > pid_t pid = 0; > + struct fown_struct *f_owner; > > - read_lock_irq(&filp->f_owner.lock); > + f_owner = file_f_owner(filp); > + if (!f_owner) > + return pid; > + > + read_lock_irq(&f_owner->lock); > rcu_read_lock(); > - if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) { > - pid = pid_vnr(filp->f_owner.pid); > - if (filp->f_owner.pid_type == PIDTYPE_PGID) > + if (pid_task(f_owner->pid, f_owner->pid_type)) { > + pid = pid_vnr(f_owner->pid); > + if (f_owner->pid_type == PIDTYPE_PGID) > pid = -pid; > } > rcu_read_unlock(); > - read_unlock_irq(&filp->f_owner.lock); > + read_unlock_irq(&f_owner->lock); > return pid; > } > > @@ -194,6 +236,10 @@ static int f_setown_ex(struct file *filp, unsigned long arg) > return -EINVAL; > } > > + ret = file_f_owner_allocate(filp); > + if (ret) > + return ret; > + > rcu_read_lock(); > pid = find_vpid(owner.pid); > if (owner.pid && !pid) > @@ -210,13 +256,20 @@ static int f_getown_ex(struct file *filp, unsigned long arg) > struct f_owner_ex __user *owner_p = (void __user *)arg; > struct f_owner_ex owner = {}; > int ret = 0; > + struct fown_struct *f_owner; > + enum pid_type pid_type = PIDTYPE_PID; > > - read_lock_irq(&filp->f_owner.lock); > - rcu_read_lock(); > - if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) > - owner.pid = pid_vnr(filp->f_owner.pid); > - rcu_read_unlock(); > - switch (filp->f_owner.pid_type) { > + f_owner = file_f_owner(filp); > + if (f_owner) { > + read_lock_irq(&f_owner->lock); > + rcu_read_lock(); > + if (pid_task(f_owner->pid, f_owner->pid_type)) > + owner.pid = pid_vnr(f_owner->pid); > + rcu_read_unlock(); > + pid_type = f_owner->pid_type; > + } > + > + switch (pid_type) { > case PIDTYPE_PID: > owner.type = F_OWNER_TID; > break; > @@ -234,7 +287,8 @@ static int f_getown_ex(struct file *filp, unsigned long arg) > ret = -EINVAL; > break; > } > - read_unlock_irq(&filp->f_owner.lock); > + if (f_owner) > + read_unlock_irq(&f_owner->lock); > > if (!ret) { > ret = copy_to_user(owner_p, &owner, sizeof(owner)); > @@ -248,14 +302,18 @@ static int f_getown_ex(struct file *filp, unsigned long arg) > static int f_getowner_uids(struct file *filp, unsigned long arg) > { > struct user_namespace *user_ns = current_user_ns(); > + struct fown_struct *f_owner; > uid_t __user *dst = (void __user *)arg; > - uid_t src[2]; > + uid_t src[2] = {0, 0}; > int err; > > - read_lock_irq(&filp->f_owner.lock); > - src[0] = from_kuid(user_ns, filp->f_owner.uid); > - src[1] = from_kuid(user_ns, filp->f_owner.euid); > - read_unlock_irq(&filp->f_owner.lock); > + f_owner = file_f_owner(filp); > + if (f_owner) { > + read_lock_irq(&f_owner->lock); > + src[0] = from_kuid(user_ns, f_owner->uid); > + src[1] = from_kuid(user_ns, f_owner->euid); > + read_unlock_irq(&f_owner->lock); > + } > > err = put_user(src[0], &dst[0]); > err |= put_user(src[1], &dst[1]); > @@ -343,6 +401,30 @@ static long f_dupfd_query(int fd, struct file *filp) > return f.file == filp; > } > > +static int f_owner_sig(struct file *filp, int signum, bool setsig) > +{ > + int ret = 0; > + struct fown_struct *f_owner; > + > + might_sleep(); > + > + if (setsig) { > + if (!valid_signal(signum)) > + return -EINVAL; > + > + ret = file_f_owner_allocate(filp); > + if (ret) > + return ret; > + } > + > + f_owner = file_f_owner(filp); > + if (setsig) > + f_owner->signum = signum; > + else if (f_owner) > + ret = f_owner->signum; > + return ret; > +} > + > static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > struct file *filp) > { > @@ -421,15 +503,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > err = f_getowner_uids(filp, arg); > break; > case F_GETSIG: > - err = filp->f_owner.signum; > + err = f_owner_sig(filp, 0, false); > break; > case F_SETSIG: > - /* arg == 0 restores default behaviour. */ > - if (!valid_signal(argi)) { > - break; > - } > - err = 0; > - filp->f_owner.signum = argi; > + err = f_owner_sig(filp, argi, true); > break; > case F_GETLEASE: > err = fcntl_getlease(filp); > @@ -844,14 +921,19 @@ static void send_sigurg_to_task(struct task_struct *p, > do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, type); > } > > -int send_sigurg(struct fown_struct *fown) > +int send_sigurg(struct file *file) > { > + struct fown_struct *fown; > struct task_struct *p; > enum pid_type type; > struct pid *pid; > unsigned long flags; > int ret = 0; > > + fown = file_f_owner(file); > + if (fown) > + return 0; This needs to be fixed (as Mateusz pointed out). > + > read_lock_irqsave(&fown->lock, flags); > > type = fown->pid_type; > @@ -1027,13 +1109,16 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band) > } > read_lock_irqsave(&fa->fa_lock, flags); > if (fa->fa_file) { > - fown = &fa->fa_file->f_owner; > + fown = file_f_owner(fa->fa_file); > + if (!fown) > + goto next; > /* Don't send SIGURG to processes which have not set a > queued signum: SIGURG has its own default signalling > mechanism. */ > if (!(sig == SIGURG && fown->signum == 0)) > send_sigio(fown, fa->fa_fd, band); > } > +next: > read_unlock_irqrestore(&fa->fa_lock, flags); > fa = rcu_dereference(fa->fa_next); > } > diff --git a/fs/file_table.c b/fs/file_table.c > index ca7843dde56d..41ff037a8dc9 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -155,7 +155,6 @@ static int init_file(struct file *f, int flags, const struct cred *cred) > return error; > } > > - rwlock_init(&f->f_owner.lock); > spin_lock_init(&f->f_lock); > mutex_init(&f->f_pos_lock); > f->f_flags = flags; > @@ -425,7 +424,10 @@ static void __fput(struct file *file) > cdev_put(inode->i_cdev); > } > fops_put(file->f_op); > - put_pid(file->f_owner.pid); > + if (file->f_owner) { > + put_pid(file->f_owner->pid); > + kfree(file->f_owner); > + } > put_file_access(file); > dput(dentry); > if (unlikely(mode & FMODE_NEED_UNMOUNT)) > diff --git a/fs/locks.c b/fs/locks.c > index 9afb16e0683f..c0d312481b97 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -1451,7 +1451,7 @@ int lease_modify(struct file_lease *fl, int arg, struct list_head *dispose) > struct file *filp = fl->c.flc_file; > > f_delown(filp); > - filp->f_owner.signum = 0; > + file_f_owner(filp)->signum = 0; > fasync_helper(0, fl->c.flc_file, 0, &fl->fl_fasync); > if (fl->fl_fasync != NULL) { > printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync); > @@ -1783,6 +1783,10 @@ generic_add_lease(struct file *filp, int arg, struct file_lease **flp, void **pr > lease = *flp; > trace_generic_add_lease(inode, lease); > > + error = file_f_owner_allocate(filp); > + if (error) > + return error; > + > /* Note that arg is never F_UNLCK here */ > ctx = locks_get_lock_context(inode, arg); > if (!ctx) > diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c > index f3669403fabf..46440fbb8662 100644 > --- a/fs/notify/dnotify/dnotify.c > +++ b/fs/notify/dnotify/dnotify.c > @@ -110,7 +110,7 @@ static int dnotify_handle_event(struct fsnotify_mark *inode_mark, u32 mask, > prev = &dn->dn_next; > continue; > } > - fown = &dn->dn_filp->f_owner; > + fown = file_f_owner(dn->dn_filp); > send_sigio(fown, dn->dn_fd, POLL_MSG); > if (dn->dn_mask & FS_DN_MULTISHOT) > prev = &dn->dn_next; > @@ -316,6 +316,10 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned int arg) > goto out_err; > } > > + error = file_f_owner_allocate(filp); > + if (error) > + goto out_err; > + > /* set up the new_fsn_mark and new_dn_mark */ > new_fsn_mark = &new_dn_mark->fsn_mark; > fsnotify_init_mark(new_fsn_mark, dnotify_group); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index fd34b5755c0b..319c566a9e98 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -947,6 +947,7 @@ static inline unsigned imajor(const struct inode *inode) > } > > struct fown_struct { > + struct file *file; /* backpointer for security modules */ This struct was 32 bytes before (on x86_64). Now it'll be 40. That's fine, but it may be worthwhile to create a dedicated slabcache for this now, since it's no longer a power-of-2 size. > rwlock_t lock; /* protects pid, uid, euid fields */ > struct pid *pid; /* pid or -pgrp where SIGIO should be sent */ > enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */ > @@ -1011,7 +1012,7 @@ struct file { > struct mutex f_pos_lock; > loff_t f_pos; > unsigned int f_flags; > - struct fown_struct f_owner; > + struct fown_struct *f_owner; > const struct cred *f_cred; > struct file_ra_state f_ra; > struct path f_path; > @@ -1076,6 +1077,12 @@ struct file_lease; > #define OFFT_OFFSET_MAX type_max(off_t) > #endif > > +int file_f_owner_allocate(struct file *file); > +static inline struct fown_struct *file_f_owner(const struct file *file) > +{ > + return READ_ONCE(file->f_owner); > +} > + > extern void send_sigio(struct fown_struct *fown, int fd, int band); > > static inline struct inode *file_inode(const struct file *f) > @@ -1124,7 +1131,7 @@ extern void __f_setown(struct file *filp, struct pid *, enum pid_type, int force > extern int f_setown(struct file *filp, int who, int force); > extern void f_delown(struct file *filp); > extern pid_t f_getown(struct file *filp); > -extern int send_sigurg(struct fown_struct *fown); > +extern int send_sigurg(struct file *file); > > /* > * sb->s_flags. Note that these mirror the equivalent MS_* flags where > diff --git a/net/core/sock.c b/net/core/sock.c > index 9abc4fe25953..bbe4c58470c3 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -3429,7 +3429,7 @@ static void sock_def_destruct(struct sock *sk) > void sk_send_sigurg(struct sock *sk) > { > if (sk->sk_socket && sk->sk_socket->file) > - if (send_sigurg(&sk->sk_socket->file->f_owner)) > + if (send_sigurg(sk->sk_socket->file)) > sk_wake_async(sk, SOCK_WAKE_URG, POLL_PRI); > } > EXPORT_SYMBOL(sk_send_sigurg); > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 55c78c318ccd..ed252cfba4e9 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -3940,7 +3940,7 @@ static int selinux_file_send_sigiotask(struct task_struct *tsk, > struct file_security_struct *fsec; > > /* struct fown_struct is never outside the context of a struct file */ > - file = container_of(fown, struct file, f_owner); > + file = fown->file; > > fsec = selinux_file(file); > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index 4164699cd4f6..cb33920ab67c 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -1950,7 +1950,7 @@ static int smack_file_send_sigiotask(struct task_struct *tsk, > /* > * struct fown_struct is never outside the context of a struct file > */ > - file = container_of(fown, struct file, f_owner); > + file = fown->file; > > /* we don't log here as rc can be overriden */ > blob = smack_file(file); > > --- > base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b > change-id: 20240813-work-f_owner-0fbbc50f9671 > Aside from the bug that Mateusz pointed out, this looks fine to me. Assuming you fix that bug: Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Tue, Aug 13, 2024 at 3:02 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Tue, 2024-08-13 at 14:30 +0200, Christian Brauner wrote: > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index fd34b5755c0b..319c566a9e98 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -947,6 +947,7 @@ static inline unsigned imajor(const struct inode *inode) > > } > > > > struct fown_struct { > > + struct file *file; /* backpointer for security modules */ > > This struct was 32 bytes before (on x86_64). Now it'll be 40. That's > fine, but it may be worthwhile to create a dedicated slabcache for this > now, since it's no longer a power-of-2 size. > creating a dedicated slab would be a waste imo If someone is concerned with memory usage here, I note that plausibly the file pointer can be plumbed through sigio code, eliding any use of the newly added ->file However, a real daredevil would check if the thing to do is perhaps to add a 48-byte malloc slab. But that would require quite a bit of looking at allocations elsewhere, collecting stats and whatnot. Noting this just in case, I have negative interest in looking at this.
On Tue, Aug 13, 2024 at 3:42 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > > On Tue, Aug 13, 2024 at 3:02 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > On Tue, 2024-08-13 at 14:30 +0200, Christian Brauner wrote: > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > > index fd34b5755c0b..319c566a9e98 100644 > > > --- a/include/linux/fs.h > > > +++ b/include/linux/fs.h > > > @@ -947,6 +947,7 @@ static inline unsigned imajor(const struct inode *inode) > > > } > > > > > > struct fown_struct { > > > + struct file *file; /* backpointer for security modules */ > > > > This struct was 32 bytes before (on x86_64). Now it'll be 40. That's > > fine, but it may be worthwhile to create a dedicated slabcache for this > > now, since it's no longer a power-of-2 size. > > > > creating a dedicated slab would be a waste imo > > If someone is concerned with memory usage here, I note that plausibly > the file pointer can be plumbed through sigio code, eliding any use of > the newly added ->file > > However, a real daredevil would check if the thing to do is perhaps to > add a 48-byte malloc slab. But that would require quite a bit of > looking at allocations elsewhere, collecting stats and whatnot. Noting > this just in case, I have negative interest in looking at this. > Some massaging of the struct brings me back to 32 bytes despite the pointer: - rwlock -> spinlock - signum int->short - pid_type to 1 byte enum but I don't think making this work is necessary at this point struct fown_struct_poc { struct file * file; /* 0 8 */ spinlock_t lock; /* 8 4 */ short int signum; /* 12 2 */ /* Bitfield combined with previous fields */ enum pid_type pid_type:8; /* 12:16 4 */ /* XXX 8 bits hole, try to pack */ struct pid * pid; /* 16 8 */ kuid_t uid; /* 24 4 */ kuid_t euid; /* 28 4 */ /* size: 32, cachelines: 1, members: 7 */ /* sum members: 30 */ /* sum bitfield members: 8 bits, bit holes: 1, sum bit holes: 8 bits */ /* last cacheline: 32 bytes */ };
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 1d06c560c5e6..6fe5e8f7017c 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -3451,6 +3451,12 @@ static int tun_chr_fasync(int fd, struct file *file, int on) struct tun_file *tfile = file->private_data; int ret; + if (on) { + ret = file_f_owner_allocate(file); + if (ret) + goto out; + } + if ((ret = fasync_helper(fd, file, on, &tfile->fasync)) < 0) goto out; diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 407b0d87b7c1..7ae0c8934f42 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -2225,6 +2225,12 @@ static int __tty_fasync(int fd, struct file *filp, int on) if (tty_paranoia_check(tty, file_inode(filp), "tty_fasync")) goto out; + if (on) { + retval = file_f_owner_allocate(filp); + if (retval) + goto out; + } + retval = fasync_helper(fd, filp, on, &tty->fasync); if (retval <= 0) goto out; diff --git a/fs/fcntl.c b/fs/fcntl.c index 300e5d9ad913..b002730fdcd1 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -87,22 +87,53 @@ static int setfl(int fd, struct file * filp, unsigned int arg) return error; } +/* + * Allocate an file->f_owner struct if it doesn't exist, handling racing + * allocations correctly. + */ +int file_f_owner_allocate(struct file *file) +{ + struct fown_struct *f_owner; + + f_owner = file_f_owner(file); + if (f_owner) + return 0; + + f_owner = kzalloc(sizeof(struct fown_struct), GFP_KERNEL); + if (!f_owner) + return -ENOMEM; + + rwlock_init(&f_owner->lock); + f_owner->file = file; + /* If someone else raced us, drop our allocation. */ + if (unlikely(cmpxchg(&file->f_owner, NULL, f_owner))) + kfree(f_owner); + return 0; +} +EXPORT_SYMBOL(file_f_owner_allocate); + static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, int force) { - write_lock_irq(&filp->f_owner.lock); - if (force || !filp->f_owner.pid) { - put_pid(filp->f_owner.pid); - filp->f_owner.pid = get_pid(pid); - filp->f_owner.pid_type = type; + struct fown_struct *f_owner; + + f_owner = file_f_owner(filp); + if (WARN_ON_ONCE(!f_owner)) + return; + + write_lock_irq(&f_owner->lock); + if (force || !f_owner->pid) { + put_pid(f_owner->pid); + f_owner->pid = get_pid(pid); + f_owner->pid_type = type; if (pid) { const struct cred *cred = current_cred(); - filp->f_owner.uid = cred->uid; - filp->f_owner.euid = cred->euid; + f_owner->uid = cred->uid; + f_owner->euid = cred->euid; } } - write_unlock_irq(&filp->f_owner.lock); + write_unlock_irq(&f_owner->lock); } void __f_setown(struct file *filp, struct pid *pid, enum pid_type type, @@ -119,6 +150,8 @@ int f_setown(struct file *filp, int who, int force) struct pid *pid = NULL; int ret = 0; + might_sleep(); + type = PIDTYPE_TGID; if (who < 0) { /* avoid overflow below */ @@ -129,6 +162,10 @@ int f_setown(struct file *filp, int who, int force) who = -who; } + ret = file_f_owner_allocate(filp); + if (ret) + return ret; + rcu_read_lock(); if (who) { pid = find_vpid(who); @@ -152,16 +189,21 @@ void f_delown(struct file *filp) pid_t f_getown(struct file *filp) { pid_t pid = 0; + struct fown_struct *f_owner; - read_lock_irq(&filp->f_owner.lock); + f_owner = file_f_owner(filp); + if (!f_owner) + return pid; + + read_lock_irq(&f_owner->lock); rcu_read_lock(); - if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) { - pid = pid_vnr(filp->f_owner.pid); - if (filp->f_owner.pid_type == PIDTYPE_PGID) + if (pid_task(f_owner->pid, f_owner->pid_type)) { + pid = pid_vnr(f_owner->pid); + if (f_owner->pid_type == PIDTYPE_PGID) pid = -pid; } rcu_read_unlock(); - read_unlock_irq(&filp->f_owner.lock); + read_unlock_irq(&f_owner->lock); return pid; } @@ -194,6 +236,10 @@ static int f_setown_ex(struct file *filp, unsigned long arg) return -EINVAL; } + ret = file_f_owner_allocate(filp); + if (ret) + return ret; + rcu_read_lock(); pid = find_vpid(owner.pid); if (owner.pid && !pid) @@ -210,13 +256,20 @@ static int f_getown_ex(struct file *filp, unsigned long arg) struct f_owner_ex __user *owner_p = (void __user *)arg; struct f_owner_ex owner = {}; int ret = 0; + struct fown_struct *f_owner; + enum pid_type pid_type = PIDTYPE_PID; - read_lock_irq(&filp->f_owner.lock); - rcu_read_lock(); - if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) - owner.pid = pid_vnr(filp->f_owner.pid); - rcu_read_unlock(); - switch (filp->f_owner.pid_type) { + f_owner = file_f_owner(filp); + if (f_owner) { + read_lock_irq(&f_owner->lock); + rcu_read_lock(); + if (pid_task(f_owner->pid, f_owner->pid_type)) + owner.pid = pid_vnr(f_owner->pid); + rcu_read_unlock(); + pid_type = f_owner->pid_type; + } + + switch (pid_type) { case PIDTYPE_PID: owner.type = F_OWNER_TID; break; @@ -234,7 +287,8 @@ static int f_getown_ex(struct file *filp, unsigned long arg) ret = -EINVAL; break; } - read_unlock_irq(&filp->f_owner.lock); + if (f_owner) + read_unlock_irq(&f_owner->lock); if (!ret) { ret = copy_to_user(owner_p, &owner, sizeof(owner)); @@ -248,14 +302,18 @@ static int f_getown_ex(struct file *filp, unsigned long arg) static int f_getowner_uids(struct file *filp, unsigned long arg) { struct user_namespace *user_ns = current_user_ns(); + struct fown_struct *f_owner; uid_t __user *dst = (void __user *)arg; - uid_t src[2]; + uid_t src[2] = {0, 0}; int err; - read_lock_irq(&filp->f_owner.lock); - src[0] = from_kuid(user_ns, filp->f_owner.uid); - src[1] = from_kuid(user_ns, filp->f_owner.euid); - read_unlock_irq(&filp->f_owner.lock); + f_owner = file_f_owner(filp); + if (f_owner) { + read_lock_irq(&f_owner->lock); + src[0] = from_kuid(user_ns, f_owner->uid); + src[1] = from_kuid(user_ns, f_owner->euid); + read_unlock_irq(&f_owner->lock); + } err = put_user(src[0], &dst[0]); err |= put_user(src[1], &dst[1]); @@ -343,6 +401,30 @@ static long f_dupfd_query(int fd, struct file *filp) return f.file == filp; } +static int f_owner_sig(struct file *filp, int signum, bool setsig) +{ + int ret = 0; + struct fown_struct *f_owner; + + might_sleep(); + + if (setsig) { + if (!valid_signal(signum)) + return -EINVAL; + + ret = file_f_owner_allocate(filp); + if (ret) + return ret; + } + + f_owner = file_f_owner(filp); + if (setsig) + f_owner->signum = signum; + else if (f_owner) + ret = f_owner->signum; + return ret; +} + static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, struct file *filp) { @@ -421,15 +503,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, err = f_getowner_uids(filp, arg); break; case F_GETSIG: - err = filp->f_owner.signum; + err = f_owner_sig(filp, 0, false); break; case F_SETSIG: - /* arg == 0 restores default behaviour. */ - if (!valid_signal(argi)) { - break; - } - err = 0; - filp->f_owner.signum = argi; + err = f_owner_sig(filp, argi, true); break; case F_GETLEASE: err = fcntl_getlease(filp); @@ -844,14 +921,19 @@ static void send_sigurg_to_task(struct task_struct *p, do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, type); } -int send_sigurg(struct fown_struct *fown) +int send_sigurg(struct file *file) { + struct fown_struct *fown; struct task_struct *p; enum pid_type type; struct pid *pid; unsigned long flags; int ret = 0; + fown = file_f_owner(file); + if (fown) + return 0; + read_lock_irqsave(&fown->lock, flags); type = fown->pid_type; @@ -1027,13 +1109,16 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band) } read_lock_irqsave(&fa->fa_lock, flags); if (fa->fa_file) { - fown = &fa->fa_file->f_owner; + fown = file_f_owner(fa->fa_file); + if (!fown) + goto next; /* Don't send SIGURG to processes which have not set a queued signum: SIGURG has its own default signalling mechanism. */ if (!(sig == SIGURG && fown->signum == 0)) send_sigio(fown, fa->fa_fd, band); } +next: read_unlock_irqrestore(&fa->fa_lock, flags); fa = rcu_dereference(fa->fa_next); } diff --git a/fs/file_table.c b/fs/file_table.c index ca7843dde56d..41ff037a8dc9 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -155,7 +155,6 @@ static int init_file(struct file *f, int flags, const struct cred *cred) return error; } - rwlock_init(&f->f_owner.lock); spin_lock_init(&f->f_lock); mutex_init(&f->f_pos_lock); f->f_flags = flags; @@ -425,7 +424,10 @@ static void __fput(struct file *file) cdev_put(inode->i_cdev); } fops_put(file->f_op); - put_pid(file->f_owner.pid); + if (file->f_owner) { + put_pid(file->f_owner->pid); + kfree(file->f_owner); + } put_file_access(file); dput(dentry); if (unlikely(mode & FMODE_NEED_UNMOUNT)) diff --git a/fs/locks.c b/fs/locks.c index 9afb16e0683f..c0d312481b97 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1451,7 +1451,7 @@ int lease_modify(struct file_lease *fl, int arg, struct list_head *dispose) struct file *filp = fl->c.flc_file; f_delown(filp); - filp->f_owner.signum = 0; + file_f_owner(filp)->signum = 0; fasync_helper(0, fl->c.flc_file, 0, &fl->fl_fasync); if (fl->fl_fasync != NULL) { printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync); @@ -1783,6 +1783,10 @@ generic_add_lease(struct file *filp, int arg, struct file_lease **flp, void **pr lease = *flp; trace_generic_add_lease(inode, lease); + error = file_f_owner_allocate(filp); + if (error) + return error; + /* Note that arg is never F_UNLCK here */ ctx = locks_get_lock_context(inode, arg); if (!ctx) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index f3669403fabf..46440fbb8662 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -110,7 +110,7 @@ static int dnotify_handle_event(struct fsnotify_mark *inode_mark, u32 mask, prev = &dn->dn_next; continue; } - fown = &dn->dn_filp->f_owner; + fown = file_f_owner(dn->dn_filp); send_sigio(fown, dn->dn_fd, POLL_MSG); if (dn->dn_mask & FS_DN_MULTISHOT) prev = &dn->dn_next; @@ -316,6 +316,10 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned int arg) goto out_err; } + error = file_f_owner_allocate(filp); + if (error) + goto out_err; + /* set up the new_fsn_mark and new_dn_mark */ new_fsn_mark = &new_dn_mark->fsn_mark; fsnotify_init_mark(new_fsn_mark, dnotify_group); diff --git a/include/linux/fs.h b/include/linux/fs.h index fd34b5755c0b..319c566a9e98 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -947,6 +947,7 @@ static inline unsigned imajor(const struct inode *inode) } struct fown_struct { + struct file *file; /* backpointer for security modules */ rwlock_t lock; /* protects pid, uid, euid fields */ struct pid *pid; /* pid or -pgrp where SIGIO should be sent */ enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */ @@ -1011,7 +1012,7 @@ struct file { struct mutex f_pos_lock; loff_t f_pos; unsigned int f_flags; - struct fown_struct f_owner; + struct fown_struct *f_owner; const struct cred *f_cred; struct file_ra_state f_ra; struct path f_path; @@ -1076,6 +1077,12 @@ struct file_lease; #define OFFT_OFFSET_MAX type_max(off_t) #endif +int file_f_owner_allocate(struct file *file); +static inline struct fown_struct *file_f_owner(const struct file *file) +{ + return READ_ONCE(file->f_owner); +} + extern void send_sigio(struct fown_struct *fown, int fd, int band); static inline struct inode *file_inode(const struct file *f) @@ -1124,7 +1131,7 @@ extern void __f_setown(struct file *filp, struct pid *, enum pid_type, int force extern int f_setown(struct file *filp, int who, int force); extern void f_delown(struct file *filp); extern pid_t f_getown(struct file *filp); -extern int send_sigurg(struct fown_struct *fown); +extern int send_sigurg(struct file *file); /* * sb->s_flags. Note that these mirror the equivalent MS_* flags where diff --git a/net/core/sock.c b/net/core/sock.c index 9abc4fe25953..bbe4c58470c3 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3429,7 +3429,7 @@ static void sock_def_destruct(struct sock *sk) void sk_send_sigurg(struct sock *sk) { if (sk->sk_socket && sk->sk_socket->file) - if (send_sigurg(&sk->sk_socket->file->f_owner)) + if (send_sigurg(sk->sk_socket->file)) sk_wake_async(sk, SOCK_WAKE_URG, POLL_PRI); } EXPORT_SYMBOL(sk_send_sigurg); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 55c78c318ccd..ed252cfba4e9 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3940,7 +3940,7 @@ static int selinux_file_send_sigiotask(struct task_struct *tsk, struct file_security_struct *fsec; /* struct fown_struct is never outside the context of a struct file */ - file = container_of(fown, struct file, f_owner); + file = fown->file; fsec = selinux_file(file); diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4164699cd4f6..cb33920ab67c 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -1950,7 +1950,7 @@ static int smack_file_send_sigiotask(struct task_struct *tsk, /* * struct fown_struct is never outside the context of a struct file */ - file = container_of(fown, struct file, f_owner); + file = fown->file; /* we don't log here as rc can be overriden */ blob = smack_file(file);
We do embedd struct fown_struct into struct file letting it take up 32 bytes in total. We could tweak struct fown_struct to be more compact but really it shouldn't even be embedded in struct file in the first place. Instead, actual users of struct fown_struct should allocate the struct on demand. This frees up 24 bytes in struct file. That will have some potentially user-visible changes for the ownership fcntl()s. Some of them can now fail due to allocation failures. Practically, that probably will almost never happen as the allocations are small and they only happen once per file. The fown_struct is used during kill_fasync() which is used by e.g., pipes to generate a SIGIO signal. Sending of such signals is conditional on userspace having set an owner for the file using one of the F_OWNER fcntl()s. Such users will be unaffected if struct fown_struct is allocated during the fcntl() call. There are a few subsystems that call __f_setown() expecting file->f_owner to be allocated: (1) tun devices file->f_op->fasync::tun_chr_fasync() -> __f_setown() There are no callers of tun_chr_fasync(). (2) tty devices file->f_op->fasync::tty_fasync() -> __tty_fasync() -> __f_setown() tty_fasync() has no additional callers but __tty_fasync() has. Note that __tty_fasync() only calls __f_setown() if the @on argument is true. It's called from: file->f_op->release::tty_release() -> tty_release() -> __tty_fasync() -> __f_setown() tty_release() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of tty_release() are safe as well. file->f_op->release::tty_open() -> tty_release() -> __tty_fasync() -> __f_setown() __tty_hangup() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of __tty_hangup() are safe as well. From the callchains it's obvious that (1) and (2) end up getting called via file->f_op->fasync(). That can happen either through the F_SETFL fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If FASYNC is requested and the file isn't already FASYNC then file->f_op->fasync() is called with @on true which ends up causing both (1) and (2) to call __f_setown(). (1) and (2) are the only subsystems that call __f_setown() from the file->f_op->fasync() handler. So both (1) and (2) have been updated to allocate a struct fown_struct prior to calling fasync_helper() to register with the fasync infrastructure. That's safe as they both call fasync_helper() which also does allocations if @on is true. The other interesting case are file leases: (3) file leases lease_manager_ops->lm_setup::lease_setup() -> __f_setown() Which in turn is called from: generic_add_lease() -> lease_manager_ops->lm_setup::lease_setup() -> __f_setown() So here again we can simply make generic_add_lease() allocate struct fown_struct prior to the lease_manager_ops->lm_setup::lease_setup() which happens under a spinlock. With that the two remaining subsystems that call __f_setown() are: (4) dnotify (5) sockets Both have their own custom ioctls to set struct fown_struct and both have been converted to allocate a struct fown_struct on demand from their respective ioctls. Interactions with O_PATH are fine as well e.g., when opening a /dev/tty as O_PATH then no file->f_op->open() happens thus no file->f_owner is allocated. That's fine as no file operation will be set for those and the device has never been opened. fcntl()s called on such things will just allocate a ->f_owner on demand. Although I have zero idea why'd you care about f_owner on an O_PATH fd. Signed-off-by: Christian Brauner <brauner@kernel.org> --- * There's no more cleanup macros used in this version as we can solve that all much simpler. * Survives LTP which tests a bunch of that stuff. * Survives perf's watermak signal tests which make use of FASYNC. Goes into -next unless I hear objections. --- --- drivers/net/tun.c | 6 ++ drivers/tty/tty_io.c | 6 ++ fs/fcntl.c | 153 ++++++++++++++++++++++++++++++++++---------- fs/file_table.c | 6 +- fs/locks.c | 6 +- fs/notify/dnotify/dnotify.c | 6 +- include/linux/fs.h | 11 +++- net/core/sock.c | 2 +- security/selinux/hooks.c | 2 +- security/smack/smack_lsm.c | 2 +- 10 files changed, 157 insertions(+), 43 deletions(-) --- base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b change-id: 20240813-work-f_owner-0fbbc50f9671