diff mbox series

[04/21] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support

Message ID 20230929102726.2985188-5-john.g.garry@oracle.com (mailing list archive)
State New, archived
Headers show
Series block atomic writes | expand

Commit Message

John Garry Sept. 29, 2023, 10:27 a.m. UTC
From: Prasad Singamsetty <prasad.singamsetty@oracle.com>

Userspace may add flag RWF_ATOMIC to pwritev2() to indicate that the
write is to be issued with torn write prevention, according to special
alignment and length rules.

Torn write prevention means that for a power or any other HW failure, all
or none of the data will be committed to storage, but never a mix of old
and new.

For any syscall interface utilizing struct iocb, add IOCB_ATOMIC for
iocb->ki_flags field to indicate the same.

A call to statx will give the relevant atomic write info:
- atomic_write_unit_min
- atomic_write_unit_max

Both values are a power-of-2.

Applications can avail of atomic write feature by ensuring that the total
length of a write is a power-of-2 in size and also sized between
atomic_write_unit_min and atomic_write_unit_max, inclusive. Applications
must ensure that the write is at a naturally-aligned offset in the file
wrt the total write length.

Signed-off-by: Prasad Singamsetty <prasad.singamsetty@oracle.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 include/linux/fs.h      | 1 +
 include/uapi/linux/fs.h | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Jeremy Bongio Oct. 6, 2023, 6:15 p.m. UTC | #1
What is the advantage of using write flags instead of using an atomic
open flag (O_ATOMIC)? With an open flag, write, writev, pwritev would
all be supported for atomic writes. And this would potentially require
less application changes to take advantage of atomic writes.

On Fri, Sep 29, 2023 at 3:28 AM John Garry <john.g.garry@oracle.com> wrote:
>
> From: Prasad Singamsetty <prasad.singamsetty@oracle.com>
>
> Userspace may add flag RWF_ATOMIC to pwritev2() to indicate that the
> write is to be issued with torn write prevention, according to special
> alignment and length rules.
>
> Torn write prevention means that for a power or any other HW failure, all
> or none of the data will be committed to storage, but never a mix of old
> and new.
>
> For any syscall interface utilizing struct iocb, add IOCB_ATOMIC for
> iocb->ki_flags field to indicate the same.
>
> A call to statx will give the relevant atomic write info:
> - atomic_write_unit_min
> - atomic_write_unit_max
>
> Both values are a power-of-2.
>
> Applications can avail of atomic write feature by ensuring that the total
> length of a write is a power-of-2 in size and also sized between
> atomic_write_unit_min and atomic_write_unit_max, inclusive. Applications
> must ensure that the write is at a naturally-aligned offset in the file
> wrt the total write length.
>
> Signed-off-by: Prasad Singamsetty <prasad.singamsetty@oracle.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  include/linux/fs.h      | 1 +
>  include/uapi/linux/fs.h | 5 ++++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b528f063e8ff..898952dee8eb 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -328,6 +328,7 @@ enum rw_hint {
>  #define IOCB_SYNC              (__force int) RWF_SYNC
>  #define IOCB_NOWAIT            (__force int) RWF_NOWAIT
>  #define IOCB_APPEND            (__force int) RWF_APPEND
> +#define IOCB_ATOMIC            (__force int) RWF_ATOMIC
>
>  /* non-RWF related bits - start at 16 */
>  #define IOCB_EVENTFD           (1 << 16)
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index b7b56871029c..e3b4f5bc6860 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -301,8 +301,11 @@ typedef int __bitwise __kernel_rwf_t;
>  /* per-IO O_APPEND */
>  #define RWF_APPEND     ((__force __kernel_rwf_t)0x00000010)
>
> +/* Atomic Write */
> +#define RWF_ATOMIC     ((__force __kernel_rwf_t)0x00000020)
> +
>  /* mask of flags supported by the kernel */
>  #define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
> -                        RWF_APPEND)
> +                        RWF_APPEND | RWF_ATOMIC)
>
>  #endif /* _UAPI_LINUX_FS_H */
> --
> 2.31.1
>
Dave Chinner Oct. 9, 2023, 10:02 p.m. UTC | #2
On Fri, Oct 06, 2023 at 11:15:11AM -0700, Jeremy Bongio wrote:
> What is the advantage of using write flags instead of using an atomic
> open flag (O_ATOMIC)? With an open flag, write, writev, pwritev would
> all be supported for atomic writes. And this would potentially require
> less application changes to take advantage of atomic writes.

Atomic writes are not a property of the file or even the inode
itself, they are an attribute of the specific IO being issued by
the application.

Most applications that want atomic writes are using it as a
performance optimisation. They are likely already using DIO with
either AIO, pwritev2 or io_uring and so are already using the
interfaces that support per-IO attributes. Not every IO to every
file needs to be atomic, so a per-IO attribute makes a lot of sense
for these applications.

Add to that that implementing atomic IO semantics in the generic IO
paths (e.g. for buffered writes) is much more difficult. It's
not an unsolvable problem (especially now with high-order folio
support in the page cache), it's just way outside the scope of this
patchset.

-Dave.
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index b528f063e8ff..898952dee8eb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -328,6 +328,7 @@  enum rw_hint {
 #define IOCB_SYNC		(__force int) RWF_SYNC
 #define IOCB_NOWAIT		(__force int) RWF_NOWAIT
 #define IOCB_APPEND		(__force int) RWF_APPEND
+#define IOCB_ATOMIC		(__force int) RWF_ATOMIC
 
 /* non-RWF related bits - start at 16 */
 #define IOCB_EVENTFD		(1 << 16)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index b7b56871029c..e3b4f5bc6860 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -301,8 +301,11 @@  typedef int __bitwise __kernel_rwf_t;
 /* per-IO O_APPEND */
 #define RWF_APPEND	((__force __kernel_rwf_t)0x00000010)
 
+/* Atomic Write */
+#define RWF_ATOMIC	((__force __kernel_rwf_t)0x00000020)
+
 /* mask of flags supported by the kernel */
 #define RWF_SUPPORTED	(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
-			 RWF_APPEND)
+			 RWF_APPEND | RWF_ATOMIC)
 
 #endif /* _UAPI_LINUX_FS_H */