Message ID | 20240717093619.3148729-3-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | man2: Document RWF_ATOMIC | expand |
On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote: > From: Himanshu Madhani <himanshu.madhani@oracle.com> > > Add RWF_ATOMIC flag description for pwritev2(). > > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> > [jpg: complete rewrite] > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 76 insertions(+) > > diff --git a/man/man2/readv.2 b/man/man2/readv.2 > index eecde06dc..9c8a11324 100644 > --- a/man/man2/readv.2 > +++ b/man/man2/readv.2 > @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. > .B O_DIRECT > flag.) > .TP > +.BR RWF_ATOMIC " (since Linux 6.11)" > +Requires that writes to regular files in block-based filesystems be issued with > +torn-write protection. > +Torn-write protection means that for a power or any other hardware failure, > +all or none of the data from the write will be stored, > +but never a mix of old and new data. > +This flag is meaningful only for > +.BR pwritev2 (), > +and its effect applies only to the data range written by the system call. > +The total write length must be power-of-2 and must be sized in the range > +.RI [ stx_atomic_write_unit_min , > +.IR stx_atomic_write_unit_max ]. > +The write must be at a naturally-aligned offset within the file with respect to > +the total write length - > +for example, Nit: these could be two sentences "The write must be at a naturally-aligned offset within the file with respect to the total write length. For example, ..." > +a write of length 32KB at a file offset of 32KB is permitted, > +however a write of length 32KB at a file offset of 48KB is not permitted. Pickier nit: KiB, not KB. > +The upper limit of > +.I iovcnt > +for > +.BR pwritev2 () > +is in "is given by" ? > +.I stx_atomic_write_segments_max. > +Torn-write protection only works with > +.B O_DIRECT > +flag, i.e. buffered writes are not supported. > +To guarantee consistency from the write between a file's in-core state with the > +storage device, > +.BR fdatasync (2), > +or > +.BR fsync (2), > +or > +.BR open (2) > +and either > +.B O_SYNC > +or > +.B O_DSYNC, > +or > +.B pwritev2 () > +and either > +.B RWF_SYNC > +or > +.B RWF_DSYNC > +is required. Flags This sentence ^^ should start on a new line. > +.B O_SYNC > +or > +.B RWF_SYNC > +provide the strongest guarantees for > +.BR RWF_ATOMIC, > +in that all data and also file metadata updates will be persisted for a > +successfully completed write. > +Just using either flags > +.B O_DSYNC > +or > +.B RWF_DSYNC > +means that all data and any file updates will be persisted for a successfully > +completed write. "any file updates" ? I /think/ the difference between O_SYNC and O_DSYNC is that O_DSYNC persists all data and file metadata updates for the file range that was written, whereas O_SYNC persists all data and file metadata updates for the entire file. Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a successfully completed write." should instead refer readers to the notes about synchronized I/O flags in the openat manpage? > +Not using any sync flags means that there is no guarantee that data or > +filesystem updates are persisted. > +.TP > .BR RWF_SYNC " (since Linux 4.7)" > .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 > Provide a per-write equivalent of the > @@ -279,10 +339,26 @@ values overflows an > .I ssize_t > value. > .TP > +.B EINVAL > + For > +.BR RWF_ATOMIC > +set, "If RWF_ATOMIC is specified..." ? (to be a bit more consistent with the language around the AT_* flags in openat) > +the combination of the sum of the > +.I iov_len > +values and the > +.I offset > +value does not comply with the length and offset torn-write protection rules. > +.TP > .B EINVAL > The vector count, > .IR iovcnt , > is less than zero or greater than the permitted maximum. > +For > +.BR RWF_ATOMIC > +set, this maximum is in (same) --D > +.I stx_atomic_write_segments_max > +from > +.I statx. > .TP > .B EOPNOTSUPP > An unknown flag is specified in \fIflags\fP. > -- > 2.31.1 >
On 17/07/2024 22:44, Darrick J. Wong wrote: > On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote: >> From: Himanshu Madhani <himanshu.madhani@oracle.com> >> >> Add RWF_ATOMIC flag description for pwritev2(). >> >> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> >> [jpg: complete rewrite] >> Signed-off-by: John Garry <john.g.garry@oracle.com> >> --- >> man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 76 insertions(+) >> >> diff --git a/man/man2/readv.2 b/man/man2/readv.2 >> index eecde06dc..9c8a11324 100644 >> --- a/man/man2/readv.2 >> +++ b/man/man2/readv.2 >> @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. >> .B O_DIRECT >> flag.) >> .TP >> +.BR RWF_ATOMIC " (since Linux 6.11)" >> +Requires that writes to regular files in block-based filesystems be issued with >> +torn-write protection. >> +Torn-write protection means that for a power or any other hardware failure, >> +all or none of the data from the write will be stored, >> +but never a mix of old and new data. >> +This flag is meaningful only for >> +.BR pwritev2 (), >> +and its effect applies only to the data range written by the system call. >> +The total write length must be power-of-2 and must be sized in the range >> +.RI [ stx_atomic_write_unit_min , >> +.IR stx_atomic_write_unit_max ]. >> +The write must be at a naturally-aligned offset within the file with respect to >> +the total write length - >> +for example, > > Nit: these could be two sentences > > "The write must be at a naturally-aligned offset within the file with > respect to the total write length. For example, ..." ok, sure > >> +a write of length 32KB at a file offset of 32KB is permitted, >> +however a write of length 32KB at a file offset of 48KB is not permitted. > > Pickier nit: KiB, not KB. ok > >> +The upper limit of >> +.I iovcnt >> +for >> +.BR pwritev2 () >> +is in > > "is given by" ? ok, fine, I don't mind > >> +.I stx_atomic_write_segments_max. >> +Torn-write protection only works with >> +.B O_DIRECT >> +flag, i.e. buffered writes are not supported. >> +To guarantee consistency from the write between a file's in-core state with the >> +storage device, >> +.BR fdatasync (2), >> +or >> +.BR fsync (2), >> +or >> +.BR open (2) >> +and either >> +.B O_SYNC >> +or >> +.B O_DSYNC, >> +or >> +.B pwritev2 () >> +and either >> +.B RWF_SYNC >> +or >> +.B RWF_DSYNC >> +is required. Flags > > This sentence ^^ should start on a new line. yes > >> +.B O_SYNC >> +or >> +.B RWF_SYNC >> +provide the strongest guarantees for >> +.BR RWF_ATOMIC, >> +in that all data and also file metadata updates will be persisted for a >> +successfully completed write. >> +Just using either flags >> +.B O_DSYNC >> +or >> +.B RWF_DSYNC >> +means that all data and any file updates will be persisted for a successfully >> +completed write. > ughh, this is hard to word both concisely and accurately... > "any file updates" ? I /think/ the difference between O_SYNC and > O_DSYNC is that O_DSYNC persists all data and file metadata updates for > the file range that was written, whereas O_SYNC persists all data and > file metadata updates for the entire file. I think that https://man7.org/linux/man-pages/man2/open.2.html#NOTES describes it best. > > Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a > successfully completed write." should instead refer readers to the notes > about synchronized I/O flags in the openat manpage? Maybe that would be better, but we just need to make it clear that RWF_ATOMIC provides the guarantee that the data is atomically updated only in addition to whatever guarantee we have for metadata updates from O_SYNC/O_DSYNC. So maybe: RWF_ATOMIC provides the guarantee that any data is written with torn-write protection, and additional flags O_SYNC or O_DSYNC provide same Synchronized I/O guarantees as documented in <openat manpage reference> OK? > >> +Not using any sync flags means that there is no guarantee that data or >> +filesystem updates are persisted. >> +.TP >> .BR RWF_SYNC " (since Linux 4.7)" >> .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 >> Provide a per-write equivalent of the >> @@ -279,10 +339,26 @@ values overflows an >> .I ssize_t >> value. >> .TP >> +.B EINVAL >> + For >> +.BR RWF_ATOMIC >> +set, > > "If RWF_ATOMIC is specified..." ? > > (to be a bit more consistent with the language around the AT_* flags in > openat) ok, fine > >> +the combination of the sum of the >> +.I iov_len >> +values and the >> +.I offset >> +value does not comply with the length and offset torn-write protection rules. >> +.TP >> .B EINVAL >> The vector count, >> .IR iovcnt , >> is less than zero or greater than the permitted maximum. >> +For >> +.BR RWF_ATOMIC >> +set, this maximum is in > > (same) > > --D > Thanks for checking, John
On Thu, Jul 18, 2024 at 03:07:59PM +0100, John Garry wrote: > On 17/07/2024 22:44, Darrick J. Wong wrote: > > On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote: > > > From: Himanshu Madhani <himanshu.madhani@oracle.com> > > > > > > Add RWF_ATOMIC flag description for pwritev2(). > > > > > > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> > > > [jpg: complete rewrite] > > > Signed-off-by: John Garry <john.g.garry@oracle.com> > > > --- > > > man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 76 insertions(+) > > > > > > diff --git a/man/man2/readv.2 b/man/man2/readv.2 > > > index eecde06dc..9c8a11324 100644 > > > --- a/man/man2/readv.2 > > > +++ b/man/man2/readv.2 > > > @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. > > > .B O_DIRECT > > > flag.) > > > .TP > > > +.BR RWF_ATOMIC " (since Linux 6.11)" > > > +Requires that writes to regular files in block-based filesystems be issued with > > > +torn-write protection. > > > +Torn-write protection means that for a power or any other hardware failure, > > > +all or none of the data from the write will be stored, > > > +but never a mix of old and new data. > > > +This flag is meaningful only for > > > +.BR pwritev2 (), > > > +and its effect applies only to the data range written by the system call. > > > +The total write length must be power-of-2 and must be sized in the range > > > +.RI [ stx_atomic_write_unit_min , > > > +.IR stx_atomic_write_unit_max ]. > > > +The write must be at a naturally-aligned offset within the file with respect to > > > +the total write length - > > > +for example, > > > > Nit: these could be two sentences > > > > "The write must be at a naturally-aligned offset within the file with > > respect to the total write length. For example, ..." > > ok, sure > > > > > > +a write of length 32KB at a file offset of 32KB is permitted, > > > +however a write of length 32KB at a file offset of 48KB is not permitted. > > > > Pickier nit: KiB, not KB. > > ok > > > > > > +The upper limit of > > > +.I iovcnt > > > +for > > > +.BR pwritev2 () > > > +is in > > > > "is given by" ? > > ok, fine, I don't mind > > > > > > +.I stx_atomic_write_segments_max. > > > +Torn-write protection only works with > > > +.B O_DIRECT > > > +flag, i.e. buffered writes are not supported. > > > +To guarantee consistency from the write between a file's in-core state with the > > > +storage device, > > > +.BR fdatasync (2), > > > +or > > > +.BR fsync (2), > > > +or > > > +.BR open (2) > > > +and either > > > +.B O_SYNC > > > +or > > > +.B O_DSYNC, > > > +or > > > +.B pwritev2 () > > > +and either > > > +.B RWF_SYNC > > > +or > > > +.B RWF_DSYNC > > > +is required. Flags > > > > This sentence ^^ should start on a new line. > > yes > > > > > > +.B O_SYNC > > > +or > > > +.B RWF_SYNC > > > +provide the strongest guarantees for > > > +.BR RWF_ATOMIC, > > > +in that all data and also file metadata updates will be persisted for a > > > +successfully completed write. > > > +Just using either flags > > > +.B O_DSYNC > > > +or > > > +.B RWF_DSYNC > > > +means that all data and any file updates will be persisted for a successfully > > > +completed write. > > > > ughh, this is hard to word both concisely and accurately... > > > "any file updates" ? I /think/ the difference between O_SYNC and > > O_DSYNC is that O_DSYNC persists all data and file metadata updates for > > the file range that was written, whereas O_SYNC persists all data and > > file metadata updates for the entire file. > > I think that https://man7.org/linux/man-pages/man2/open.2.html#NOTES > describes it best. > > > > > Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a > > successfully completed write." should instead refer readers to the notes > > about synchronized I/O flags in the openat manpage? > > Maybe that would be better, but we just need to make it clear that > RWF_ATOMIC provides the guarantee that the data is atomically updated only > in addition to whatever guarantee we have for metadata updates from > O_SYNC/O_DSYNC. > > > So maybe: > RWF_ATOMIC provides the guarantee that any data is written with torn-write > protection, and additional flags O_SYNC or O_DSYNC provide > same Synchronized I/O guarantees as documented in <openat manpage reference> ^ the same > > OK? Yes. > > > +Not using any sync flags means that there is no guarantee that data or > > > +filesystem updates are persisted. > > > +.TP > > > .BR RWF_SYNC " (since Linux 4.7)" > > > .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 > > > Provide a per-write equivalent of the > > > @@ -279,10 +339,26 @@ values overflows an > > > .I ssize_t > > > value. > > > .TP > > > +.B EINVAL > > > + For > > > +.BR RWF_ATOMIC > > > +set, > > > > "If RWF_ATOMIC is specified..." ? > > > > (to be a bit more consistent with the language around the AT_* flags in > > openat) > > ok, fine > > > > > > +the combination of the sum of the > > > +.I iov_len > > > +values and the > > > +.I offset > > > +value does not comply with the length and offset torn-write protection rules. > > > +.TP > > > .B EINVAL > > > The vector count, > > > .IR iovcnt , > > > is less than zero or greater than the permitted maximum. > > > +For > > > +.BR RWF_ATOMIC > > > +set, this maximum is in > > > > (same) > > > > --D > > > > Thanks for checking, NP. :) --D > John > >
diff --git a/man/man2/readv.2 b/man/man2/readv.2 index eecde06dc..9c8a11324 100644 --- a/man/man2/readv.2 +++ b/man/man2/readv.2 @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. .B O_DIRECT flag.) .TP +.BR RWF_ATOMIC " (since Linux 6.11)" +Requires that writes to regular files in block-based filesystems be issued with +torn-write protection. +Torn-write protection means that for a power or any other hardware failure, +all or none of the data from the write will be stored, +but never a mix of old and new data. +This flag is meaningful only for +.BR pwritev2 (), +and its effect applies only to the data range written by the system call. +The total write length must be power-of-2 and must be sized in the range +.RI [ stx_atomic_write_unit_min , +.IR stx_atomic_write_unit_max ]. +The write must be at a naturally-aligned offset within the file with respect to +the total write length - +for example, +a write of length 32KB at a file offset of 32KB is permitted, +however a write of length 32KB at a file offset of 48KB is not permitted. +The upper limit of +.I iovcnt +for +.BR pwritev2 () +is in +.I stx_atomic_write_segments_max. +Torn-write protection only works with +.B O_DIRECT +flag, i.e. buffered writes are not supported. +To guarantee consistency from the write between a file's in-core state with the +storage device, +.BR fdatasync (2), +or +.BR fsync (2), +or +.BR open (2) +and either +.B O_SYNC +or +.B O_DSYNC, +or +.B pwritev2 () +and either +.B RWF_SYNC +or +.B RWF_DSYNC +is required. Flags +.B O_SYNC +or +.B RWF_SYNC +provide the strongest guarantees for +.BR RWF_ATOMIC, +in that all data and also file metadata updates will be persisted for a +successfully completed write. +Just using either flags +.B O_DSYNC +or +.B RWF_DSYNC +means that all data and any file updates will be persisted for a successfully +completed write. +Not using any sync flags means that there is no guarantee that data or +filesystem updates are persisted. +.TP .BR RWF_SYNC " (since Linux 4.7)" .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 Provide a per-write equivalent of the @@ -279,10 +339,26 @@ values overflows an .I ssize_t value. .TP +.B EINVAL + For +.BR RWF_ATOMIC +set, +the combination of the sum of the +.I iov_len +values and the +.I offset +value does not comply with the length and offset torn-write protection rules. +.TP .B EINVAL The vector count, .IR iovcnt , is less than zero or greater than the permitted maximum. +For +.BR RWF_ATOMIC +set, this maximum is in +.I stx_atomic_write_segments_max +from +.I statx. .TP .B EOPNOTSUPP An unknown flag is specified in \fIflags\fP.