Message ID | 20241203153232.92224-9-axboe@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Uncached buffered IO | expand |
On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote: > If a file system supports uncached buffered IO, it may set FOP_UNCACHED > and enable RWF_UNCACHED. If RWF_UNCACHED is attempted without the file > system supporting it, it'll get errored with -EOPNOTSUPP. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > include/linux/fs.h | 14 +++++++++++++- > include/uapi/linux/fs.h | 6 +++++- > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 7e29433c5ecc..b64a78582f06 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -322,6 +322,7 @@ struct readahead_control; > #define IOCB_NOWAIT (__force int) RWF_NOWAIT > #define IOCB_APPEND (__force int) RWF_APPEND > #define IOCB_ATOMIC (__force int) RWF_ATOMIC > +#define IOCB_UNCACHED (__force int) RWF_UNCACHED > > /* non-RWF related bits - start at 16 */ > #define IOCB_EVENTFD (1 << 16) > @@ -356,7 +357,8 @@ struct readahead_control; > { IOCB_SYNC, "SYNC" }, \ > { IOCB_NOWAIT, "NOWAIT" }, \ > { IOCB_APPEND, "APPEND" }, \ > - { IOCB_ATOMIC, "ATOMIC"}, \ > + { IOCB_ATOMIC, "ATOMIC" }, \ > + { IOCB_UNCACHED, "UNCACHED" }, \ > { IOCB_EVENTFD, "EVENTFD"}, \ > { IOCB_DIRECT, "DIRECT" }, \ > { IOCB_WRITE, "WRITE" }, \ > @@ -2127,6 +2129,8 @@ struct file_operations { > #define FOP_UNSIGNED_OFFSET ((__force fop_flags_t)(1 << 5)) > /* Supports asynchronous lock callbacks */ > #define FOP_ASYNC_LOCK ((__force fop_flags_t)(1 << 6)) > +/* File system supports uncached read/write buffered IO */ > +#define FOP_UNCACHED ((__force fop_flags_t)(1 << 7)) > > /* Wrap a directory iterator that needs exclusive inode access */ > int wrap_directory_iterator(struct file *, struct dir_context *, > @@ -3614,6 +3618,14 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags, > if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE)) > return -EOPNOTSUPP; > } > + if (flags & RWF_UNCACHED) { Should FMODE_NOREUSE imply RWF_UNCACHED? I know, I'm dredging this up again from v3: https://lore.kernel.org/linux-fsdevel/ZzKn4OyHXq5r6eiI@dread.disaster.area/ but the manpage for fadvise says NOREUSE means "The specified data will be accessed only once." and I think that fits what you're doing here. And yeah, it's annoying that people keep asking for moar knobs to tweak io operations: Let's have a mount option, and a fadvise mode, and a fcntl mode, and finally per-io flags! (mostly kidding) Also, one of your replies referenced a poc to set UNCACHED on NOREUSE involving willy and yu. Where was that? I've found this: https://lore.kernel.org/linux-fsdevel/ZzI97bky3Rwzw18C@casper.infradead.org/ but that turned into a documentation discussion. There were also a few unanswered questions (imo) from the last few iterations of this patchset. If someone issues a lot of small appending uncached writes to a file, does that mean the writes and writeback will now be lockstepping each other to write out the folio? Or should programs simply not do that? What if I wanted to do a bunch of small writes to adjacent bytes, amortize writeback over a single disk io, and not wait for reclaim to drop the folio? Admittedly that doesn't really fit with "will be accessed only once" so I think "don't do that" is an acceptable answer. And, I guess if the application really wants fine-grained control then it /can/ still pwrite, sync_file_range, and fadvise(WONTNEED). Though that's three syscalls/uring ops/whatever. But that might be cheaper than repeated rewrites. --D > + /* file system must support it */ > + if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED)) > + return -EOPNOTSUPP; > + /* DAX mappings not supported */ > + if (IS_DAX(ki->ki_filp->f_mapping->host)) > + return -EOPNOTSUPP; > + } > kiocb_flags |= (__force int) (flags & RWF_SUPPORTED); > if (flags & RWF_SYNC) > kiocb_flags |= IOCB_DSYNC; > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > index 753971770733..dc77cd8ae1a3 100644 > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -332,9 +332,13 @@ typedef int __bitwise __kernel_rwf_t; > /* Atomic Write */ > #define RWF_ATOMIC ((__force __kernel_rwf_t)0x00000040) > > +/* buffered IO that drops the cache after reading or writing data */ > +#define RWF_UNCACHED ((__force __kernel_rwf_t)0x00000080) > + > /* mask of flags supported by the kernel */ > #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ > - RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC) > + RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\ > + RWF_UNCACHED) > > #define PROCFS_IOCTL_MAGIC 'f' > > -- > 2.45.2 > >
On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote: > + if (flags & RWF_UNCACHED) { > + /* file system must support it */ > + if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED)) > + return -EOPNOTSUPP; > + /* DAX mappings not supported */ > + if (IS_DAX(ki->ki_filp->f_mapping->host)) > + return -EOPNOTSUPP; I'd argue that DAX is always uncached and could just ignore the flag. Same for direct I/O.
On 12/10/24 4:22 AM, Christoph Hellwig wrote: > On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote: >> + if (flags & RWF_UNCACHED) { >> + /* file system must support it */ >> + if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED)) >> + return -EOPNOTSUPP; >> + /* DAX mappings not supported */ >> + if (IS_DAX(ki->ki_filp->f_mapping->host)) >> + return -EOPNOTSUPP; > > I'd argue that DAX is always uncached and could just ignore the flag. > Same for direct I/O. It's more of a safe guard in terms of the invalidation requiring extra work for DAX.
diff --git a/include/linux/fs.h b/include/linux/fs.h index 7e29433c5ecc..b64a78582f06 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -322,6 +322,7 @@ struct readahead_control; #define IOCB_NOWAIT (__force int) RWF_NOWAIT #define IOCB_APPEND (__force int) RWF_APPEND #define IOCB_ATOMIC (__force int) RWF_ATOMIC +#define IOCB_UNCACHED (__force int) RWF_UNCACHED /* non-RWF related bits - start at 16 */ #define IOCB_EVENTFD (1 << 16) @@ -356,7 +357,8 @@ struct readahead_control; { IOCB_SYNC, "SYNC" }, \ { IOCB_NOWAIT, "NOWAIT" }, \ { IOCB_APPEND, "APPEND" }, \ - { IOCB_ATOMIC, "ATOMIC"}, \ + { IOCB_ATOMIC, "ATOMIC" }, \ + { IOCB_UNCACHED, "UNCACHED" }, \ { IOCB_EVENTFD, "EVENTFD"}, \ { IOCB_DIRECT, "DIRECT" }, \ { IOCB_WRITE, "WRITE" }, \ @@ -2127,6 +2129,8 @@ struct file_operations { #define FOP_UNSIGNED_OFFSET ((__force fop_flags_t)(1 << 5)) /* Supports asynchronous lock callbacks */ #define FOP_ASYNC_LOCK ((__force fop_flags_t)(1 << 6)) +/* File system supports uncached read/write buffered IO */ +#define FOP_UNCACHED ((__force fop_flags_t)(1 << 7)) /* Wrap a directory iterator that needs exclusive inode access */ int wrap_directory_iterator(struct file *, struct dir_context *, @@ -3614,6 +3618,14 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags, if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE)) return -EOPNOTSUPP; } + if (flags & RWF_UNCACHED) { + /* file system must support it */ + if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED)) + return -EOPNOTSUPP; + /* DAX mappings not supported */ + if (IS_DAX(ki->ki_filp->f_mapping->host)) + return -EOPNOTSUPP; + } kiocb_flags |= (__force int) (flags & RWF_SUPPORTED); if (flags & RWF_SYNC) kiocb_flags |= IOCB_DSYNC; diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 753971770733..dc77cd8ae1a3 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -332,9 +332,13 @@ typedef int __bitwise __kernel_rwf_t; /* Atomic Write */ #define RWF_ATOMIC ((__force __kernel_rwf_t)0x00000040) +/* buffered IO that drops the cache after reading or writing data */ +#define RWF_UNCACHED ((__force __kernel_rwf_t)0x00000080) + /* mask of flags supported by the kernel */ #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ - RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC) + RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\ + RWF_UNCACHED) #define PROCFS_IOCTL_MAGIC 'f'
If a file system supports uncached buffered IO, it may set FOP_UNCACHED and enable RWF_UNCACHED. If RWF_UNCACHED is attempted without the file system supporting it, it'll get errored with -EOPNOTSUPP. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- include/linux/fs.h | 14 +++++++++++++- include/uapi/linux/fs.h | 6 +++++- 2 files changed, 18 insertions(+), 2 deletions(-)