Message ID | 20250319114402.3757248-1-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] statx.2: Add stx_atomic_write_unit_max_opt | expand |
On Wed, Mar 19, 2025 at 11:44:02AM +0000, John Garry wrote: > XFS supports atomic writes - or untorn writes - based on different methods: > - HW offload in the disk > - Software emulation > > The value reported in stx_atomic_write_unit_max will be the max of the > software emulation method. I don't think emulation is a good word. A file system implementing file systems things is not emulation. > We want STATX_WRITE_ATOMIC to get this new member in addition to the > already-existing members, so mention that a value of 0 means that > stx_atomic_write_unit_max holds this limit. Does that actually work? Can userspace assume all unknown statx fields are padded to zero? If so my dio read align change could have done away with the extra flag.
On 20/03/2025 07:00, Christoph Hellwig wrote: > On Wed, Mar 19, 2025 at 11:44:02AM +0000, John Garry wrote: >> XFS supports atomic writes - or untorn writes - based on different methods: >> - HW offload in the disk >> - Software emulation >> >> The value reported in stx_atomic_write_unit_max will be the max of the >> software emulation method. > > I don't think emulation is a good word. A file system implementing > file systems things is not emulation. Sure, I am still in the mindset that a filesystem-based atomic write is a 2nd-class citizen and just trying to emulate what can be done in the disk. > >> We want STATX_WRITE_ATOMIC to get this new member in addition to the >> already-existing members, so mention that a value of 0 means that >> stx_atomic_write_unit_max holds this limit. > > Does that actually work? Can userspace assume all unknown statx > fields are padded to zero? If so my dio read align change could have > done away with the extra flag. I will double check that, but if we needed to add another mask just for getting this, then yuck. > > But is there value in reporting this limit? I am not sure. I am not sure what the user would do with this info. Maybe, for example, they want to write 1K consecutive 16K pages, each atomically, and decide to do a big 16M atomic write but find that it is slow as bdev atomic limit is < 16M. Maybe I should just update the documentation to mention that for XFS they should check the mounted bdev atomic limits.
On Thu, Mar 20, 2025 at 09:19:40AM +0000, John Garry wrote: > But is there value in reporting this limit? I am not sure. I am not sure > what the user would do with this info. Align their data structures to it, e.g. size the log buffers to it. > Maybe, for example, they want to write 1K consecutive 16K pages, each > atomically, and decide to do a big 16M atomic write but find that it is > slow as bdev atomic limit is < 16M. > > Maybe I should just update the documentation to mention that for XFS they > should check the mounted bdev atomic limits. For something working on files having to figure out the underlying block device (which is non-trivial given the various methods of multi-device support) and then looking into block sysfs is a no-go. So if we have any sort of use case for it we should expose the limit.
On 20/03/2025 14:12, Christoph Hellwig wrote: > On Thu, Mar 20, 2025 at 09:19:40AM +0000, John Garry wrote: >> But is there value in reporting this limit? I am not sure. I am not sure >> what the user would do with this info. > > Align their data structures to it, e.g. size the log buffers to it. > Sure, there may be a usecase there. So far I am just considering the DB usecase, and they know the atomic write size which they want to do, i.e. their internal page size, and align to that. If that internal page size <= this opt limit, then good. >> Maybe, for example, they want to write 1K consecutive 16K pages, each >> atomically, and decide to do a big 16M atomic write but find that it is >> slow as bdev atomic limit is < 16M. >> >> Maybe I should just update the documentation to mention that for XFS they >> should check the mounted bdev atomic limits. > > For something working on files having to figure out the underlying > block device (which is non-trivial given the various methods of > multi-device support) and then looking into block sysfs is a no-go. > > So if we have any sort of use case for it we should expose the limit. > Coming back to what was discussed about not adding a new flag to fetch this limit: > Does that actually work? Can userspace assume all unknown statx > fields are padded to zero? In cp_statx, we do pre-zero the statx structure. As such, the rule "if zero, just use hard limit unit max" seems to hold. > If so my dio read align change could have > done away with the extra flag. Sounds like it. Maybe this practice is not preferred, i.e. changing what the request/result mask returns.
On Fri, Mar 21, 2025 at 10:20:21AM +0000, John Garry wrote: > Coming back to what was discussed about not adding a new flag to fetch this > limit: > > > Does that actually work? Can userspace assume all unknown statx > > fields are padded to zero? > > In cp_statx, we do pre-zero the statx structure. As such, the rule "if > zero, just use hard limit unit max" seems to hold. Ok, canwe document this somewhere?
diff --git a/man/man2/statx.2 b/man/man2/statx.2 index 0abac75c1..c3872f05d 100644 --- a/man/man2/statx.2 +++ b/man/man2/statx.2 @@ -79,6 +79,9 @@ struct statx { \& /* File offset alignment for direct I/O reads */ __u32 stx_dio_read_offset_align; +\& + /* Direct I/O atomic write opt max limit */ + __u32 stx_atomic_write_unit_max_opt; }; .EE .in @@ -271,7 +274,8 @@ STATX_SUBVOL Want stx_subvol (since Linux 6.10; support varies by filesystem) STATX_WRITE_ATOMIC Want stx_atomic_write_unit_min, stx_atomic_write_unit_max, - and stx_atomic_write_segments_max. + stx_atomic_write_segments_max, + and stx_atomic_write_unit_max_opt. (since Linux 6.11; support varies by filesystem) STATX_DIO_READ_ALIGN Want stx_dio_read_offset_align. (since Linux 6.14; support varies by filesystem) @@ -519,6 +523,15 @@ is supported on block devices since Linux 6.11. The support on regular files varies by filesystem; it is supported by xfs and ext4 since Linux 6.13. .TP +.I stx_atomic_write_unit_max_opt +The maximum size (in bytes) which is optimised for fast +untorn writes. +This value must not exceed the value in +.I stx_atomic_write_unit_max. +A value of 0 indicates that +.I stx_atomic_write_unit_max +is the optimised limit. +.TP .I stx_atomic_write_segments_max The maximum number of elements in an array of vectors for a write with torn-write protection enabled.
XFS supports atomic writes - or untorn writes - based on different methods: - HW offload in the disk - Software emulation The value reported in stx_atomic_write_unit_max will be the max of the software emulation method. The max atomic write unit size of the software emulated atomic writes will generally be much larger than the HW offload. However, software emulated atomic writes will also be typically much slower. The filesystem will transparently support both methods, specifically HW offload is the preferred method when possible, e.g. if write size is small enough then HW offload will be used. Advertise this HW offload limit to the user in a new statx member, stx_atomic_write_unit_max_opt. We want STATX_WRITE_ATOMIC to get this new member in addition to the already-existing members, so mention that a value of 0 means that stx_atomic_write_unit_max holds this limit. Signed-off-by: John Garry <john.g.garry@oracle.com> --- I'm sending as an RFC as I am not sure if we need bother with this. Maybe it's better to update the man page to mention that software emulated atomic writes are available, and the user should check the mounted bdev atomic write limits instead to know this opt limit.