Message ID | 20240730051625.14349-3-viro@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [01/39] memcg_write_event_control(): fix a user-triggerable oops | expand |
On Tue, Jul 30, 2024 at 01:15:49AM -0400, viro@kernel.org wrote: > From: Al Viro <viro@zeniv.linux.org.uk> > > The absolute majority of instances comes from fdget() and its > relatives; the underlying primitives actually return a struct file > reference and a couple of flags encoded into an unsigned long - the lower > two bits of file address are always zero, so we can stash the flags > into those. On the way out we use __to_fd() to unpack that unsigned > long into struct fd. > > Let's use that representation for struct fd itself - make it > a structure with a single unsigned long member (.word), with the value > equal either to (unsigned long)p | flags, p being an address of some > struct file instance, or to 0 for an empty fd. > > Note that we never used a struct fd instance with NULL ->file > and non-zero ->flags; the emptiness had been checked as (!f.file) and > we expected e.g. fdput(empty) to be a no-op. With new representation > we can use (!f.word) for emptiness check; that is enough for compiler > to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f) > will be a no-op in such case. > > For now the new predicate (fd_empty(f)) has no users; all the > existing checks have form (!fd_file(f)). We will convert to fd_empty() > use later; here we only define it (and tell the compiler that it's > unlikely to return true). > > This commit only deals with representation change; there will > be followups. I'm still trawling through all of this code and trying to grok it, but one thing I kept wondering was wtf would we do this ->word trick in the first place? It seemed needlessly complicated and I don't love having structs with inprecise members. But then buried deep in your cover letter you have this """ It's not that hard to deal with - the real primitives behind fdget() et.al. are returning an unsigned long value, unpacked by (inlined) __to_fd() into the current struct file * + int. Linus suggested that keeping that unsigned long around with the extractions done by inlined accessors should generate a sane code and that turns out to be the case. Turning struct fd into a struct-wrapped unsinged long, with fd_empty(f) => unlikely(f.word == 0) fd_file(f) => (struct file *)(f.word & ~3) fdput(f) => if (f.word & 1) fput(fd_file(f)) ends up with compiler doing the right thing. The cost is the patch footprint, of course - we need to switch f.file to fd_file(f) all over the tree, and it's not doable with simple search and replace; there are false positives, etc. Might become a PITA at merge time; however, actual update of that from 6.10-rc1 to 6.11-rc1 had brought surprisingly few conflicts. """ Which makes a whole lot of sense. The member name doesn't matter much since we're now using helpers everywhere, but for an idiot like me having something like this struct fd { unsigned long __file_ptr; }; would make it so that you don't have random people deciding they can access fd->file, and it helps make it more clear what exactly the point of this thing is. That being said I think renaming it really isn't the important part, I think including something like the bit you wrote above in the commit message would help when somebody is trying to figure out why this more obtuse strategy is used rather than just having struct file *__file with the low bit masking stuff tacked on the side. I'm still working through all the patches, but in general the doc made sense, and the patches thusfar are easy enough to follow. Thanks, Josef
On Tue, Jul 30, 2024 at 01:15:49AM GMT, viro@kernel.org wrote: > From: Al Viro <viro@zeniv.linux.org.uk> > > The absolute majority of instances comes from fdget() and its > relatives; the underlying primitives actually return a struct file > reference and a couple of flags encoded into an unsigned long - the lower > two bits of file address are always zero, so we can stash the flags > into those. On the way out we use __to_fd() to unpack that unsigned > long into struct fd. > > Let's use that representation for struct fd itself - make it > a structure with a single unsigned long member (.word), with the value > equal either to (unsigned long)p | flags, p being an address of some > struct file instance, or to 0 for an empty fd. > > Note that we never used a struct fd instance with NULL ->file > and non-zero ->flags; the emptiness had been checked as (!f.file) and > we expected e.g. fdput(empty) to be a no-op. With new representation > we can use (!f.word) for emptiness check; that is enough for compiler > to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f) > will be a no-op in such case. > > For now the new predicate (fd_empty(f)) has no users; all the > existing checks have form (!fd_file(f)). We will convert to fd_empty() > use later; here we only define it (and tell the compiler that it's > unlikely to return true). > > This commit only deals with representation change; there will > be followups. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > drivers/infiniband/core/uverbs_cmd.c | 2 +- > fs/overlayfs/file.c | 28 +++++++++++++++------------- > fs/xfs/xfs_handle.c | 2 +- > include/linux/file.h | 22 ++++++++++++++++------ > kernel/events/core.c | 2 +- > net/socket.c | 2 +- > 6 files changed, 35 insertions(+), 23 deletions(-) > > diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c > index 3f85575cf971..a4cce360df21 100644 > --- a/drivers/infiniband/core/uverbs_cmd.c > +++ b/drivers/infiniband/core/uverbs_cmd.c > @@ -572,7 +572,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs) > struct inode *inode = NULL; > int new_xrcd = 0; > struct ib_device *ib_dev; > - struct fd f = {}; > + struct fd f = EMPTY_FD; > int ret; > > ret = uverbs_request(attrs, &cmd, sizeof(cmd)); > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > index c4963d0c5549..2b7a5a3a7a2f 100644 > --- a/fs/overlayfs/file.c > +++ b/fs/overlayfs/file.c > @@ -93,11 +93,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real, > bool allow_meta) > { > struct dentry *dentry = file_dentry(file); > + struct file *realfile = file->private_data; > struct path realpath; > int err; > > - real->flags = 0; > - real->file = file->private_data; > + real->word = (unsigned long)realfile; > > if (allow_meta) { > ovl_path_real(dentry, &realpath); > @@ -113,16 +113,17 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real, > return -EIO; > > /* Has it been copied up since we'd opened it? */ > - if (unlikely(file_inode(real->file) != d_inode(realpath.dentry))) { > - real->flags = FDPUT_FPUT; > - real->file = ovl_open_realfile(file, &realpath); > - > - return PTR_ERR_OR_ZERO(real->file); > + if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) { > + struct file *f = ovl_open_realfile(file, &realpath); > + if (IS_ERR(f)) > + return PTR_ERR(f); > + real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT; > + return 0; > } > > /* Did the flags change since open? */ > - if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS)) > - return ovl_change_flags(real->file, file->f_flags); > + if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS)) > + return ovl_change_flags(realfile, file->f_flags); > > return 0; > } > @@ -130,10 +131,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real, > static int ovl_real_fdget(const struct file *file, struct fd *real) > { > if (d_is_dir(file_dentry(file))) { > - real->flags = 0; > - real->file = ovl_dir_real_file(file, false); > - > - return PTR_ERR_OR_ZERO(real->file); > + struct file *f = ovl_dir_real_file(file, false); > + if (IS_ERR(f)) > + return PTR_ERR(f); > + real->word = (unsigned long)f; > + return 0; > } > > return ovl_real_fdget_meta(file, real, false); > diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c > index 7bcc4f519cb8..49e5e5f04e60 100644 > --- a/fs/xfs/xfs_handle.c > +++ b/fs/xfs/xfs_handle.c > @@ -85,7 +85,7 @@ xfs_find_handle( > int hsize; > xfs_handle_t handle; > struct inode *inode; > - struct fd f = {NULL}; > + struct fd f = EMPTY_FD; > struct path path; > int error; > struct xfs_inode *ip; > diff --git a/include/linux/file.h b/include/linux/file.h > index 0f3f369f2450..bdd6e1766839 100644 > --- a/include/linux/file.h > +++ b/include/linux/file.h > @@ -35,18 +35,28 @@ static inline void fput_light(struct file *file, int fput_needed) > fput(file); > } > > +/* either a reference to struct file + flags > + * (cloned vs. borrowed, pos locked), with > + * flags stored in lower bits of value, > + * or empty (represented by 0). > + */ > struct fd { > - struct file *file; > - unsigned int flags; > + unsigned long word; > }; > #define FDPUT_FPUT 1 > #define FDPUT_POS_UNLOCK 2 > > -#define fd_file(f) ((f).file) > +#define fd_file(f) ((struct file *)((f).word & ~3)) Minor thing but it would be nice if you'd use the macros for this instead of the raw number: #define fd_file(f) ((struct file *)((f).word & ~(FDPUT_FPUT | FDPUT_POS_UNLOCK)) Reviewed-by: Christian Brauner <brauner@kernel.org>
> Which makes a whole lot of sense. The member name doesn't matter much since > we're now using helpers everywhere, but for an idiot like me having something > like this > > struct fd { > unsigned long __file_ptr; > }; I agree that that would be helpful.
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 3f85575cf971..a4cce360df21 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -572,7 +572,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs) struct inode *inode = NULL; int new_xrcd = 0; struct ib_device *ib_dev; - struct fd f = {}; + struct fd f = EMPTY_FD; int ret; ret = uverbs_request(attrs, &cmd, sizeof(cmd)); diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index c4963d0c5549..2b7a5a3a7a2f 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -93,11 +93,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real, bool allow_meta) { struct dentry *dentry = file_dentry(file); + struct file *realfile = file->private_data; struct path realpath; int err; - real->flags = 0; - real->file = file->private_data; + real->word = (unsigned long)realfile; if (allow_meta) { ovl_path_real(dentry, &realpath); @@ -113,16 +113,17 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real, return -EIO; /* Has it been copied up since we'd opened it? */ - if (unlikely(file_inode(real->file) != d_inode(realpath.dentry))) { - real->flags = FDPUT_FPUT; - real->file = ovl_open_realfile(file, &realpath); - - return PTR_ERR_OR_ZERO(real->file); + if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) { + struct file *f = ovl_open_realfile(file, &realpath); + if (IS_ERR(f)) + return PTR_ERR(f); + real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT; + return 0; } /* Did the flags change since open? */ - if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS)) - return ovl_change_flags(real->file, file->f_flags); + if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS)) + return ovl_change_flags(realfile, file->f_flags); return 0; } @@ -130,10 +131,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real, static int ovl_real_fdget(const struct file *file, struct fd *real) { if (d_is_dir(file_dentry(file))) { - real->flags = 0; - real->file = ovl_dir_real_file(file, false); - - return PTR_ERR_OR_ZERO(real->file); + struct file *f = ovl_dir_real_file(file, false); + if (IS_ERR(f)) + return PTR_ERR(f); + real->word = (unsigned long)f; + return 0; } return ovl_real_fdget_meta(file, real, false); diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c index 7bcc4f519cb8..49e5e5f04e60 100644 --- a/fs/xfs/xfs_handle.c +++ b/fs/xfs/xfs_handle.c @@ -85,7 +85,7 @@ xfs_find_handle( int hsize; xfs_handle_t handle; struct inode *inode; - struct fd f = {NULL}; + struct fd f = EMPTY_FD; struct path path; int error; struct xfs_inode *ip; diff --git a/include/linux/file.h b/include/linux/file.h index 0f3f369f2450..bdd6e1766839 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -35,18 +35,28 @@ static inline void fput_light(struct file *file, int fput_needed) fput(file); } +/* either a reference to struct file + flags + * (cloned vs. borrowed, pos locked), with + * flags stored in lower bits of value, + * or empty (represented by 0). + */ struct fd { - struct file *file; - unsigned int flags; + unsigned long word; }; #define FDPUT_FPUT 1 #define FDPUT_POS_UNLOCK 2 -#define fd_file(f) ((f).file) +#define fd_file(f) ((struct file *)((f).word & ~3)) +static inline bool fd_empty(struct fd f) +{ + return unlikely(!f.word); +} + +#define EMPTY_FD (struct fd){0} static inline void fdput(struct fd fd) { - if (fd.flags & FDPUT_FPUT) + if (fd.word & FDPUT_FPUT) fput(fd_file(fd)); } @@ -60,7 +70,7 @@ extern void __f_unlock_pos(struct file *); static inline struct fd __to_fd(unsigned long v) { - return (struct fd){(struct file *)(v & ~3),v & 3}; + return (struct fd){v}; } static inline struct fd fdget(unsigned int fd) @@ -80,7 +90,7 @@ static inline struct fd fdget_pos(int fd) static inline void fdput_pos(struct fd f) { - if (f.flags & FDPUT_POS_UNLOCK) + if (f.word & FDPUT_POS_UNLOCK) __f_unlock_pos(fd_file(f)); fdput(f); } diff --git a/kernel/events/core.c b/kernel/events/core.c index 17b19d3e74ba..fd2ac9c7fd77 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12474,7 +12474,7 @@ SYSCALL_DEFINE5(perf_event_open, struct perf_event_attr attr; struct perf_event_context *ctx; struct file *event_file = NULL; - struct fd group = {NULL, 0}; + struct fd group = EMPTY_FD; struct task_struct *task = NULL; struct pmu *pmu; int event_fd; diff --git a/net/socket.c b/net/socket.c index f77a42a74510..c0d4f5032374 100644 --- a/net/socket.c +++ b/net/socket.c @@ -559,7 +559,7 @@ static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed) if (fd_file(f)) { sock = sock_from_file(fd_file(f)); if (likely(sock)) { - *fput_needed = f.flags & FDPUT_FPUT; + *fput_needed = f.word & FDPUT_FPUT; return sock; } *err = -ENOTSOCK;