diff mbox series

[07/12] fs: add RWF_UNCACHED iocb and FOP_UNCACHED file_operations flag

Message ID 20241203153232.92224-9-axboe@kernel.dk (mailing list archive)
State New
Headers show
Series Uncached buffered IO | expand

Commit Message

Jens Axboe Dec. 3, 2024, 3:31 p.m. UTC
If a file system supports uncached buffered IO, it may set FOP_UNCACHED
and enable RWF_UNCACHED. If RWF_UNCACHED is attempted without the file
system supporting it, it'll get errored with -EOPNOTSUPP.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/fs.h      | 14 +++++++++++++-
 include/uapi/linux/fs.h |  6 +++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

Comments

Darrick J. Wong Dec. 6, 2024, 5:35 p.m. UTC | #1
On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote:
> If a file system supports uncached buffered IO, it may set FOP_UNCACHED
> and enable RWF_UNCACHED. If RWF_UNCACHED is attempted without the file
> system supporting it, it'll get errored with -EOPNOTSUPP.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/linux/fs.h      | 14 +++++++++++++-
>  include/uapi/linux/fs.h |  6 +++++-
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7e29433c5ecc..b64a78582f06 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -322,6 +322,7 @@ struct readahead_control;
>  #define IOCB_NOWAIT		(__force int) RWF_NOWAIT
>  #define IOCB_APPEND		(__force int) RWF_APPEND
>  #define IOCB_ATOMIC		(__force int) RWF_ATOMIC
> +#define IOCB_UNCACHED		(__force int) RWF_UNCACHED
>  
>  /* non-RWF related bits - start at 16 */
>  #define IOCB_EVENTFD		(1 << 16)
> @@ -356,7 +357,8 @@ struct readahead_control;
>  	{ IOCB_SYNC,		"SYNC" }, \
>  	{ IOCB_NOWAIT,		"NOWAIT" }, \
>  	{ IOCB_APPEND,		"APPEND" }, \
> -	{ IOCB_ATOMIC,		"ATOMIC"}, \
> +	{ IOCB_ATOMIC,		"ATOMIC" }, \
> +	{ IOCB_UNCACHED,	"UNCACHED" }, \
>  	{ IOCB_EVENTFD,		"EVENTFD"}, \
>  	{ IOCB_DIRECT,		"DIRECT" }, \
>  	{ IOCB_WRITE,		"WRITE" }, \
> @@ -2127,6 +2129,8 @@ struct file_operations {
>  #define FOP_UNSIGNED_OFFSET	((__force fop_flags_t)(1 << 5))
>  /* Supports asynchronous lock callbacks */
>  #define FOP_ASYNC_LOCK		((__force fop_flags_t)(1 << 6))
> +/* File system supports uncached read/write buffered IO */
> +#define FOP_UNCACHED		((__force fop_flags_t)(1 << 7))
>  
>  /* Wrap a directory iterator that needs exclusive inode access */
>  int wrap_directory_iterator(struct file *, struct dir_context *,
> @@ -3614,6 +3618,14 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
>  		if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE))
>  			return -EOPNOTSUPP;
>  	}
> +	if (flags & RWF_UNCACHED) {

Should FMODE_NOREUSE imply RWF_UNCACHED?  I know, I'm dredging this up
again from v3:

https://lore.kernel.org/linux-fsdevel/ZzKn4OyHXq5r6eiI@dread.disaster.area/

but the manpage for fadvise says NOREUSE means "The specified data will
be accessed only once." and I think that fits what you're doing here.
And yeah, it's annoying that people keep asking for moar knobs to tweak
io operations: Let's have a mount option, and a fadvise mode, and a
fcntl mode, and finally per-io flags!  (mostly kidding)

Also, one of your replies referenced a poc to set UNCACHED on NOREUSE
involving willy and yu.  Where was that?  I've found this:

https://lore.kernel.org/linux-fsdevel/ZzI97bky3Rwzw18C@casper.infradead.org/

but that turned into a documentation discussion.

There were also a few unanswered questions (imo) from the last few
iterations of this patchset.

If someone issues a lot of small appending uncached writes to a file,
does that mean the writes and writeback will now be lockstepping each
other to write out the folio?  Or should programs simply not do that?

What if I wanted to do a bunch of small writes to adjacent bytes,
amortize writeback over a single disk io, and not wait for reclaim to
drop the folio?  Admittedly that doesn't really fit with "will be
accessed only once" so I think "don't do that" is an acceptable answer.

And, I guess if the application really wants fine-grained control then
it /can/ still pwrite, sync_file_range, and fadvise(WONTNEED).  Though
that's three syscalls/uring ops/whatever.  But that might be cheaper
than repeated rewrites.

--D

> +		/* file system must support it */
> +		if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED))
> +			return -EOPNOTSUPP;
> +		/* DAX mappings not supported */
> +		if (IS_DAX(ki->ki_filp->f_mapping->host))
> +			return -EOPNOTSUPP;
> +	}
>  	kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
>  	if (flags & RWF_SYNC)
>  		kiocb_flags |= IOCB_DSYNC;
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 753971770733..dc77cd8ae1a3 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -332,9 +332,13 @@ typedef int __bitwise __kernel_rwf_t;
>  /* Atomic Write */
>  #define RWF_ATOMIC	((__force __kernel_rwf_t)0x00000040)
>  
> +/* buffered IO that drops the cache after reading or writing data */
> +#define RWF_UNCACHED	((__force __kernel_rwf_t)0x00000080)
> +
>  /* mask of flags supported by the kernel */
>  #define RWF_SUPPORTED	(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
> -			 RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC)
> +			 RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\
> +			 RWF_UNCACHED)
>  
>  #define PROCFS_IOCTL_MAGIC 'f'
>  
> -- 
> 2.45.2
> 
>
Christoph Hellwig Dec. 10, 2024, 11:22 a.m. UTC | #2
On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote:
> +	if (flags & RWF_UNCACHED) {
> +		/* file system must support it */
> +		if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED))
> +			return -EOPNOTSUPP;
> +		/* DAX mappings not supported */
> +		if (IS_DAX(ki->ki_filp->f_mapping->host))
> +			return -EOPNOTSUPP;

I'd argue that DAX is always uncached and could just ignore the flag.
Same for direct I/O.
Jens Axboe Dec. 12, 2024, 7:42 p.m. UTC | #3
On 12/10/24 4:22 AM, Christoph Hellwig wrote:
> On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote:
>> +	if (flags & RWF_UNCACHED) {
>> +		/* file system must support it */
>> +		if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED))
>> +			return -EOPNOTSUPP;
>> +		/* DAX mappings not supported */
>> +		if (IS_DAX(ki->ki_filp->f_mapping->host))
>> +			return -EOPNOTSUPP;
> 
> I'd argue that DAX is always uncached and could just ignore the flag.
> Same for direct I/O.

It's more of a safe guard in terms of the invalidation requiring extra
work for DAX.
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7e29433c5ecc..b64a78582f06 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -322,6 +322,7 @@  struct readahead_control;
 #define IOCB_NOWAIT		(__force int) RWF_NOWAIT
 #define IOCB_APPEND		(__force int) RWF_APPEND
 #define IOCB_ATOMIC		(__force int) RWF_ATOMIC
+#define IOCB_UNCACHED		(__force int) RWF_UNCACHED
 
 /* non-RWF related bits - start at 16 */
 #define IOCB_EVENTFD		(1 << 16)
@@ -356,7 +357,8 @@  struct readahead_control;
 	{ IOCB_SYNC,		"SYNC" }, \
 	{ IOCB_NOWAIT,		"NOWAIT" }, \
 	{ IOCB_APPEND,		"APPEND" }, \
-	{ IOCB_ATOMIC,		"ATOMIC"}, \
+	{ IOCB_ATOMIC,		"ATOMIC" }, \
+	{ IOCB_UNCACHED,	"UNCACHED" }, \
 	{ IOCB_EVENTFD,		"EVENTFD"}, \
 	{ IOCB_DIRECT,		"DIRECT" }, \
 	{ IOCB_WRITE,		"WRITE" }, \
@@ -2127,6 +2129,8 @@  struct file_operations {
 #define FOP_UNSIGNED_OFFSET	((__force fop_flags_t)(1 << 5))
 /* Supports asynchronous lock callbacks */
 #define FOP_ASYNC_LOCK		((__force fop_flags_t)(1 << 6))
+/* File system supports uncached read/write buffered IO */
+#define FOP_UNCACHED		((__force fop_flags_t)(1 << 7))
 
 /* Wrap a directory iterator that needs exclusive inode access */
 int wrap_directory_iterator(struct file *, struct dir_context *,
@@ -3614,6 +3618,14 @@  static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
 		if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE))
 			return -EOPNOTSUPP;
 	}
+	if (flags & RWF_UNCACHED) {
+		/* file system must support it */
+		if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED))
+			return -EOPNOTSUPP;
+		/* DAX mappings not supported */
+		if (IS_DAX(ki->ki_filp->f_mapping->host))
+			return -EOPNOTSUPP;
+	}
 	kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
 	if (flags & RWF_SYNC)
 		kiocb_flags |= IOCB_DSYNC;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 753971770733..dc77cd8ae1a3 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -332,9 +332,13 @@  typedef int __bitwise __kernel_rwf_t;
 /* Atomic Write */
 #define RWF_ATOMIC	((__force __kernel_rwf_t)0x00000040)
 
+/* buffered IO that drops the cache after reading or writing data */
+#define RWF_UNCACHED	((__force __kernel_rwf_t)0x00000080)
+
 /* mask of flags supported by the kernel */
 #define RWF_SUPPORTED	(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
-			 RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC)
+			 RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\
+			 RWF_UNCACHED)
 
 #define PROCFS_IOCTL_MAGIC 'f'