Message ID | 20220120071215.123274-1-ebiggers@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | add support for direct I/O with fscrypt using blk-crypto | expand |
On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > Given the above, as far as I know the only remaining objection to this > patchset would be that DIO constraints aren't sufficiently discoverable > by userspace. Now, to put this in context, this is a longstanding issue > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > not specific to this feature, and it doesn't actually seem to be too > important in practice; many other filesystem features place constraints > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > (And for better or worse, many systems using fscrypt already have > out-of-tree patches that enable DIO support, and people don't seem to > have trouble with the FS block size alignment requirement.) It might make sense to use this as an opportunity to implement XFS_IOC_DIOINFO for ext4 and f2fs. > I plan to propose a new generic ioctl to address the issue of DIO > constraints being insufficiently discoverable. But until then, I'm > wondering if people are willing to consider this patchset again, or > whether it is considered blocked by this issue alone. (And if this > patchset is still unacceptable, would it be acceptable with f2fs support > only, given that f2fs *already* only allows FS block size aligned DIO?) I think the patchset looks fine, but I'd really love to have a way for the alignment restrictions to be discoverable from the start.
On Thu, Jan 20, 2022 at 12:30:23AM -0800, Christoph Hellwig wrote: > On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > > > Given the above, as far as I know the only remaining objection to this > > patchset would be that DIO constraints aren't sufficiently discoverable > > by userspace. Now, to put this in context, this is a longstanding issue > > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > > not specific to this feature, and it doesn't actually seem to be too > > important in practice; many other filesystem features place constraints > > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > > (And for better or worse, many systems using fscrypt already have > > out-of-tree patches that enable DIO support, and people don't seem to > > have trouble with the FS block size alignment requirement.) > > It might make sense to use this as an opportunity to implement > XFS_IOC_DIOINFO for ext4 and f2fs. Hmm. A potential problem with DIOINFO is that it doesn't explicitly list the /file/ position alignment requirement: struct dioattr { __u32 d_mem; /* data buffer memory alignment */ __u32 d_miniosz; /* min xfer size */ __u32 d_maxiosz; /* max xfer size */ }; Since I /think/ fscrypt requires that directio writes be aligned to file block size, right? > > I plan to propose a new generic ioctl to address the issue of DIO > > constraints being insufficiently discoverable. But until then, I'm Which is what I suspect Eric meant by this sentence. :) > > wondering if people are willing to consider this patchset again, or > > whether it is considered blocked by this issue alone. (And if this > > patchset is still unacceptable, would it be acceptable with f2fs support > > only, given that f2fs *already* only allows FS block size aligned DIO?) > > I think the patchset looks fine, but I'd really love to have a way for > the alignment restrictions to be discoverable from the start. I agree. The mechanics of the patchset look ok to me, but it's very unfortunate that there's no way for userspace programs to ask the kernel about the directio geometry for a file. Ever since we added reflink to XFS I've wanted to add a way to tell userspace that direct writes to a reflink(able) file will be much more efficient if they can align the io request to 1 fs block instead of 1 sector. How about something like this: struct dioattr2 { __u32 d_mem; /* data buffer memory alignment */ __u32 d_miniosz; /* min xfer size */ __u32 d_maxiosz; /* max xfer size */ /* file range must be aligned to this value */ __u32 d_min_fpos; /* for optimal performance, align file range to this */ __u32 d_opt_fpos; __u32 d_padding[11]; }; --D
On Thu, Jan 20, 2022 at 09:10:27AM -0800, Darrick J. Wong wrote: > On Thu, Jan 20, 2022 at 12:30:23AM -0800, Christoph Hellwig wrote: > > On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > > > > > Given the above, as far as I know the only remaining objection to this > > > patchset would be that DIO constraints aren't sufficiently discoverable > > > by userspace. Now, to put this in context, this is a longstanding issue > > > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > > > not specific to this feature, and it doesn't actually seem to be too > > > important in practice; many other filesystem features place constraints > > > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > > > (And for better or worse, many systems using fscrypt already have > > > out-of-tree patches that enable DIO support, and people don't seem to > > > have trouble with the FS block size alignment requirement.) > > > > It might make sense to use this as an opportunity to implement > > XFS_IOC_DIOINFO for ext4 and f2fs. > > Hmm. A potential problem with DIOINFO is that it doesn't explicitly > list the /file/ position alignment requirement: > > struct dioattr { > __u32 d_mem; /* data buffer memory alignment */ > __u32 d_miniosz; /* min xfer size */ > __u32 d_maxiosz; /* max xfer size */ > }; Well, the comment above struct dioattr says: /* * Direct I/O attribute record used with XFS_IOC_DIOINFO * d_miniosz is the min xfer size, xfer size multiple and file seek offset * alignment. */ So d_miniosz serves that purpose already. > > Since I /think/ fscrypt requires that directio writes be aligned to file > block size, right? The file position must be a multiple of the filesystem block size, yes. Likewise for the "minimum xfer size" and "xfer size multiple", and the "data buffer memory alignment" for that matter. So I think XFS_IOC_DIOINFO would be good enough for the fscrypt direct I/O case. The real question is whether there are any direct I/O implementations where XFS_IOC_DIOINFO would *not* be good enough, for example due to "xfer size multiple" != "file seek offset alignment" being allowed. In that case we would need to define a new ioctl that is more general (like the one you described below) rather than simply uplifting XFS_IOC_DIOINFO. More general is nice, but it's not helpful if no one will actually use the extra information. So we need to figure out what is actually useful. > How about something like this: > > struct dioattr2 { > __u32 d_mem; /* data buffer memory alignment */ > __u32 d_miniosz; /* min xfer size */ > __u32 d_maxiosz; /* max xfer size */ > > /* file range must be aligned to this value */ > __u32 d_min_fpos; > > /* for optimal performance, align file range to this */ > __u32 d_opt_fpos; > > __u32 d_padding[11]; > }; > - Eric
On Thu, Jan 20, 2022 at 12:39:14PM -0800, Eric Biggers wrote: > On Thu, Jan 20, 2022 at 09:10:27AM -0800, Darrick J. Wong wrote: > > On Thu, Jan 20, 2022 at 12:30:23AM -0800, Christoph Hellwig wrote: > > > On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > > > > > > > Given the above, as far as I know the only remaining objection to this > > > > patchset would be that DIO constraints aren't sufficiently discoverable > > > > by userspace. Now, to put this in context, this is a longstanding issue > > > > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > > > > not specific to this feature, and it doesn't actually seem to be too > > > > important in practice; many other filesystem features place constraints > > > > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > > > > (And for better or worse, many systems using fscrypt already have > > > > out-of-tree patches that enable DIO support, and people don't seem to > > > > have trouble with the FS block size alignment requirement.) > > > > > > It might make sense to use this as an opportunity to implement > > > XFS_IOC_DIOINFO for ext4 and f2fs. > > > > Hmm. A potential problem with DIOINFO is that it doesn't explicitly > > list the /file/ position alignment requirement: > > > > struct dioattr { > > __u32 d_mem; /* data buffer memory alignment */ > > __u32 d_miniosz; /* min xfer size */ > > __u32 d_maxiosz; /* max xfer size */ > > }; > > Well, the comment above struct dioattr says: > > /* > * Direct I/O attribute record used with XFS_IOC_DIOINFO > * d_miniosz is the min xfer size, xfer size multiple and file seek offset > * alignment. > */ > > So d_miniosz serves that purpose already. > > > > > Since I /think/ fscrypt requires that directio writes be aligned to file > > block size, right? > > The file position must be a multiple of the filesystem block size, yes. > Likewise for the "minimum xfer size" and "xfer size multiple", and the "data > buffer memory alignment" for that matter. So I think XFS_IOC_DIOINFO would be > good enough for the fscrypt direct I/O case. Oh, ok then. In that case, just hoist XFS_IOC_DIOINFO to the VFS and add a couple of implementations for ext4 and f2fs, and I think that'll be enough to get the fscrypt patchset moving again. > The real question is whether there are any direct I/O implementations where > XFS_IOC_DIOINFO would *not* be good enough, for example due to "xfer size > multiple" != "file seek offset alignment" being allowed. In that case we would > need to define a new ioctl that is more general (like the one you described > below) rather than simply uplifting XFS_IOC_DIOINFO. I don't think there are any currently, but if anyone ever redesigns DIOINFO we might as well make all those pieces explicit. > More general is nice, but it's not helpful if no one will actually use the extra > information. So we need to figure out what is actually useful. <nod> Clearly I haven't wanted d_opt_fpos badly enough to propose revving the ioctl. ;) --D > > > How about something like this: > > > > struct dioattr2 { > > __u32 d_mem; /* data buffer memory alignment */ > > __u32 d_miniosz; /* min xfer size */ > > __u32 d_maxiosz; /* max xfer size */ > > > > /* file range must be aligned to this value */ > > __u32 d_min_fpos; > > > > /* for optimal performance, align file range to this */ > > __u32 d_opt_fpos; > > > > __u32 d_padding[11]; > > }; > > > > - Eric
On Thu, Jan 20, 2022 at 01:00:27PM -0800, Darrick J. Wong wrote: > On Thu, Jan 20, 2022 at 12:39:14PM -0800, Eric Biggers wrote: > > On Thu, Jan 20, 2022 at 09:10:27AM -0800, Darrick J. Wong wrote: > > > On Thu, Jan 20, 2022 at 12:30:23AM -0800, Christoph Hellwig wrote: > > > > On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > > > > > > > > > Given the above, as far as I know the only remaining objection to this > > > > > patchset would be that DIO constraints aren't sufficiently discoverable > > > > > by userspace. Now, to put this in context, this is a longstanding issue > > > > > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > > > > > not specific to this feature, and it doesn't actually seem to be too > > > > > important in practice; many other filesystem features place constraints > > > > > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > > > > > (And for better or worse, many systems using fscrypt already have > > > > > out-of-tree patches that enable DIO support, and people don't seem to > > > > > have trouble with the FS block size alignment requirement.) > > > > > > > > It might make sense to use this as an opportunity to implement > > > > XFS_IOC_DIOINFO for ext4 and f2fs. > > > > > > Hmm. A potential problem with DIOINFO is that it doesn't explicitly > > > list the /file/ position alignment requirement: > > > > > > struct dioattr { > > > __u32 d_mem; /* data buffer memory alignment */ > > > __u32 d_miniosz; /* min xfer size */ > > > __u32 d_maxiosz; /* max xfer size */ > > > }; > > > > Well, the comment above struct dioattr says: > > > > /* > > * Direct I/O attribute record used with XFS_IOC_DIOINFO > > * d_miniosz is the min xfer size, xfer size multiple and file seek offset > > * alignment. > > */ > > > > So d_miniosz serves that purpose already. > > > > > > > > Since I /think/ fscrypt requires that directio writes be aligned to file > > > block size, right? > > > > The file position must be a multiple of the filesystem block size, yes. > > Likewise for the "minimum xfer size" and "xfer size multiple", and the "data > > buffer memory alignment" for that matter. So I think XFS_IOC_DIOINFO would be > > good enough for the fscrypt direct I/O case. > > Oh, ok then. In that case, just hoist XFS_IOC_DIOINFO to the VFS and > add a couple of implementations for ext4 and f2fs, and I think that'll > be enough to get the fscrypt patchset moving again. On the contrary, I'd much prefer to see this information added to statx(). The file offset alignment info is a property of the current file (e.g. XFS can have different per-file requirements depending on whether the file data is hosted on the data or RT device, etc) and so it's not a fixed property of the filesystem. statx() was designed to be extended with per-file property information, and we already have stuff like filesystem block size in that syscall. Hence I would much prefer that we extend it with the DIO properties we need to support rather than "create" a new VFS ioctl to extract this information. We already have statx(), so let's use it for what it was intended for. > > The real question is whether there are any direct I/O implementations where > > XFS_IOC_DIOINFO would *not* be good enough, for example due to "xfer size > > multiple" != "file seek offset alignment" being allowed. In that case we would > > need to define a new ioctl that is more general (like the one you described > > below) rather than simply uplifting XFS_IOC_DIOINFO. > > I don't think there are any currently, but if anyone ever redesigns > DIOINFO we might as well make all those pieces explicit. > > > More general is nice, but it's not helpful if no one will actually use the extra > > information. So we need to figure out what is actually useful. > > <nod> Clearly I haven't wanted d_opt_fpos badly enough to propose > revving the ioctl. ;) I think the number of applications that use DIOINFO outside of xfsprogs/xfsdump/fstests can probably be counted on one hand. Debian code search tells me: -qemu (under ifdef CONFIG_XFS) -ceph 16.2 (seastar database support?) -diod contains a copy of fsstress -e2fsprogs contains a copy of fsstress -openmpi (under ifdef SGIMPI) -partclone - actually, that has a complete copy of the xfsprogs libxfs/ iand include/ directory in it, so it's using the old libxfs_device_alignment() call that uses XFS_IOC_DIOINFOD, and only when builing the xfsclone binary. Yup, I can count them on one 6 fingered hand, and their only use is when XFS filesystems are specifically discovered. :) Hence I think it would be much more useful to application developers to include the IO alignment information in statx(), not to lift an ioctl that is pretty much unused and unknown outside the core XFS development environment.... Cheers, Dave.
On Fri, Jan 21, 2022 at 09:04:14AM +1100, Dave Chinner wrote: > On Thu, Jan 20, 2022 at 01:00:27PM -0800, Darrick J. Wong wrote: > > On Thu, Jan 20, 2022 at 12:39:14PM -0800, Eric Biggers wrote: > > > On Thu, Jan 20, 2022 at 09:10:27AM -0800, Darrick J. Wong wrote: > > > > On Thu, Jan 20, 2022 at 12:30:23AM -0800, Christoph Hellwig wrote: > > > > > On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > > > > > > > > > > > Given the above, as far as I know the only remaining objection to this > > > > > > patchset would be that DIO constraints aren't sufficiently discoverable > > > > > > by userspace. Now, to put this in context, this is a longstanding issue > > > > > > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > > > > > > not specific to this feature, and it doesn't actually seem to be too > > > > > > important in practice; many other filesystem features place constraints > > > > > > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > > > > > > (And for better or worse, many systems using fscrypt already have > > > > > > out-of-tree patches that enable DIO support, and people don't seem to > > > > > > have trouble with the FS block size alignment requirement.) > > > > > > > > > > It might make sense to use this as an opportunity to implement > > > > > XFS_IOC_DIOINFO for ext4 and f2fs. > > > > > > > > Hmm. A potential problem with DIOINFO is that it doesn't explicitly > > > > list the /file/ position alignment requirement: > > > > > > > > struct dioattr { > > > > __u32 d_mem; /* data buffer memory alignment */ > > > > __u32 d_miniosz; /* min xfer size */ > > > > __u32 d_maxiosz; /* max xfer size */ > > > > }; > > > > > > Well, the comment above struct dioattr says: > > > > > > /* > > > * Direct I/O attribute record used with XFS_IOC_DIOINFO > > > * d_miniosz is the min xfer size, xfer size multiple and file seek offset > > > * alignment. > > > */ > > > > > > So d_miniosz serves that purpose already. > > > > > > > > > > > Since I /think/ fscrypt requires that directio writes be aligned to file > > > > block size, right? > > > > > > The file position must be a multiple of the filesystem block size, yes. > > > Likewise for the "minimum xfer size" and "xfer size multiple", and the "data > > > buffer memory alignment" for that matter. So I think XFS_IOC_DIOINFO would be > > > good enough for the fscrypt direct I/O case. > > > > Oh, ok then. In that case, just hoist XFS_IOC_DIOINFO to the VFS and > > add a couple of implementations for ext4 and f2fs, and I think that'll > > be enough to get the fscrypt patchset moving again. > > On the contrary, I'd much prefer to see this information added to > statx(). The file offset alignment info is a property of the current > file (e.g. XFS can have different per-file requirements depending on > whether the file data is hosted on the data or RT device, etc) and > so it's not a fixed property of the filesystem. > > statx() was designed to be extended with per-file property > information, and we already have stuff like filesystem block size in > that syscall. Hence I would much prefer that we extend it with the > DIO properties we need to support rather than "create" a new VFS > ioctl to extract this information. We already have statx(), so let's > use it for what it was intended for. > I assumed that XFS_IOC_DIOINFO *was* per-file. XFS's *implementation* of it looks at the filesystem only, but that would be the expected implementation if the DIO constraints don't currently vary between different files in XFS. If DIO constraints do in fact already vary between different files in XFS, is this just a bug in the XFS implementation of XFS_IOC_DIOINFO? Or was XFS_IOC_DIOINFO only ever intended to report per-filesystem state? If the latter, then yes, that would mean it wouldn't really be suitable to reuse to start reporting per-file state. (Per-file state is required for encrypted files. It's also required for other filesystem features; e.g., files that use compression or fs-verity don't support direct I/O at all.) - Eric
On Thu, Jan 20, 2022 at 02:48:52PM -0800, Eric Biggers wrote: > On Fri, Jan 21, 2022 at 09:04:14AM +1100, Dave Chinner wrote: > > On Thu, Jan 20, 2022 at 01:00:27PM -0800, Darrick J. Wong wrote: > > > On Thu, Jan 20, 2022 at 12:39:14PM -0800, Eric Biggers wrote: > > > > On Thu, Jan 20, 2022 at 09:10:27AM -0800, Darrick J. Wong wrote: > > > > > On Thu, Jan 20, 2022 at 12:30:23AM -0800, Christoph Hellwig wrote: > > > > > > On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > > > > > > > > > > > > > Given the above, as far as I know the only remaining objection to this > > > > > > > patchset would be that DIO constraints aren't sufficiently discoverable > > > > > > > by userspace. Now, to put this in context, this is a longstanding issue > > > > > > > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > > > > > > > not specific to this feature, and it doesn't actually seem to be too > > > > > > > important in practice; many other filesystem features place constraints > > > > > > > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > > > > > > > (And for better or worse, many systems using fscrypt already have > > > > > > > out-of-tree patches that enable DIO support, and people don't seem to > > > > > > > have trouble with the FS block size alignment requirement.) > > > > > > > > > > > > It might make sense to use this as an opportunity to implement > > > > > > XFS_IOC_DIOINFO for ext4 and f2fs. > > > > > > > > > > Hmm. A potential problem with DIOINFO is that it doesn't explicitly > > > > > list the /file/ position alignment requirement: > > > > > > > > > > struct dioattr { > > > > > __u32 d_mem; /* data buffer memory alignment */ > > > > > __u32 d_miniosz; /* min xfer size */ > > > > > __u32 d_maxiosz; /* max xfer size */ > > > > > }; > > > > > > > > Well, the comment above struct dioattr says: > > > > > > > > /* > > > > * Direct I/O attribute record used with XFS_IOC_DIOINFO > > > > * d_miniosz is the min xfer size, xfer size multiple and file seek offset > > > > * alignment. > > > > */ > > > > > > > > So d_miniosz serves that purpose already. > > > > > > > > > > > > > > Since I /think/ fscrypt requires that directio writes be aligned to file > > > > > block size, right? > > > > > > > > The file position must be a multiple of the filesystem block size, yes. > > > > Likewise for the "minimum xfer size" and "xfer size multiple", and the "data > > > > buffer memory alignment" for that matter. So I think XFS_IOC_DIOINFO would be > > > > good enough for the fscrypt direct I/O case. > > > > > > Oh, ok then. In that case, just hoist XFS_IOC_DIOINFO to the VFS and > > > add a couple of implementations for ext4 and f2fs, and I think that'll > > > be enough to get the fscrypt patchset moving again. > > > > On the contrary, I'd much prefer to see this information added to > > statx(). The file offset alignment info is a property of the current > > file (e.g. XFS can have different per-file requirements depending on > > whether the file data is hosted on the data or RT device, etc) and > > so it's not a fixed property of the filesystem. > > > > statx() was designed to be extended with per-file property > > information, and we already have stuff like filesystem block size in > > that syscall. Hence I would much prefer that we extend it with the > > DIO properties we need to support rather than "create" a new VFS > > ioctl to extract this information. We already have statx(), so let's > > use it for what it was intended for. > > > > I assumed that XFS_IOC_DIOINFO *was* per-file. XFS's *implementation* of it > looks at the filesystem only, You've got that wrong. case XFS_IOC_DIOINFO: { >>>>>> struct xfs_buftarg *target = xfs_inode_buftarg(ip); struct dioattr da; da.d_mem = da.d_miniosz = target->bt_logical_sectorsize; xfs_inode_buftarg() is determining which block device the inode is storing it's data on, so the returned dioattr values can be different for different inodes in the filesystem... It's always been that way since the early Irix days - XFS RT devices could have very different IO constraints than the data device and DIO had to conform to the hardware limits underlying the filesystem. Hence the dioattr information has -always- been per-inode information. > (Per-file state is required for encrypted > files. It's also required for other filesystem features; e.g., files that use > compression or fs-verity don't support direct I/O at all.) Which is exactly why is should be a property of statx(), rather than try to re-use a ~30 year old filesystem specific API from a different OS that was never intended to indicate things like "DIO not supported on this file at all".... We've been bitten many times by this "lift a rarely used filesystem specific ioctl to the VFS because it exists" method of API promotion. It almost always ends up in us discovering further down the track that there's something wrong with the API, it doesn't quite do what we need, we have to extend it anyway, or it's just plain borken, etc. And then we have to create a new, fit for purpose API anyway, and there's two VFS APIs we have to maintain forever instead of just one... Can we learn from past mistakes this time instead of repeating them yet again? Cheers, Dave.
On Fri, Jan 21, 2022 at 10:57:55AM +1100, Dave Chinner wrote: > On Thu, Jan 20, 2022 at 02:48:52PM -0800, Eric Biggers wrote: > > On Fri, Jan 21, 2022 at 09:04:14AM +1100, Dave Chinner wrote: > > > On Thu, Jan 20, 2022 at 01:00:27PM -0800, Darrick J. Wong wrote: > > > > On Thu, Jan 20, 2022 at 12:39:14PM -0800, Eric Biggers wrote: > > > > > On Thu, Jan 20, 2022 at 09:10:27AM -0800, Darrick J. Wong wrote: > > > > > > On Thu, Jan 20, 2022 at 12:30:23AM -0800, Christoph Hellwig wrote: > > > > > > > On Wed, Jan 19, 2022 at 11:12:10PM -0800, Eric Biggers wrote: > > > > > > > > > > > > > > > > Given the above, as far as I know the only remaining objection to this > > > > > > > > patchset would be that DIO constraints aren't sufficiently discoverable > > > > > > > > by userspace. Now, to put this in context, this is a longstanding issue > > > > > > > > with all Linux filesystems, except XFS which has XFS_IOC_DIOINFO. It's > > > > > > > > not specific to this feature, and it doesn't actually seem to be too > > > > > > > > important in practice; many other filesystem features place constraints > > > > > > > > on DIO, and f2fs even *only* allows fully FS block size aligned DIO. > > > > > > > > (And for better or worse, many systems using fscrypt already have > > > > > > > > out-of-tree patches that enable DIO support, and people don't seem to > > > > > > > > have trouble with the FS block size alignment requirement.) > > > > > > > > > > > > > > It might make sense to use this as an opportunity to implement > > > > > > > XFS_IOC_DIOINFO for ext4 and f2fs. > > > > > > > > > > > > Hmm. A potential problem with DIOINFO is that it doesn't explicitly > > > > > > list the /file/ position alignment requirement: > > > > > > > > > > > > struct dioattr { > > > > > > __u32 d_mem; /* data buffer memory alignment */ > > > > > > __u32 d_miniosz; /* min xfer size */ > > > > > > __u32 d_maxiosz; /* max xfer size */ > > > > > > }; > > > > > > > > > > Well, the comment above struct dioattr says: > > > > > > > > > > /* > > > > > * Direct I/O attribute record used with XFS_IOC_DIOINFO > > > > > * d_miniosz is the min xfer size, xfer size multiple and file seek offset > > > > > * alignment. > > > > > */ > > > > > > > > > > So d_miniosz serves that purpose already. > > > > > > > > > > > > > > > > > Since I /think/ fscrypt requires that directio writes be aligned to file > > > > > > block size, right? > > > > > > > > > > The file position must be a multiple of the filesystem block size, yes. > > > > > Likewise for the "minimum xfer size" and "xfer size multiple", and the "data > > > > > buffer memory alignment" for that matter. So I think XFS_IOC_DIOINFO would be > > > > > good enough for the fscrypt direct I/O case. > > > > > > > > Oh, ok then. In that case, just hoist XFS_IOC_DIOINFO to the VFS and > > > > add a couple of implementations for ext4 and f2fs, and I think that'll > > > > be enough to get the fscrypt patchset moving again. > > > > > > On the contrary, I'd much prefer to see this information added to > > > statx(). The file offset alignment info is a property of the current > > > file (e.g. XFS can have different per-file requirements depending on > > > whether the file data is hosted on the data or RT device, etc) and > > > so it's not a fixed property of the filesystem. > > > > > > statx() was designed to be extended with per-file property > > > information, and we already have stuff like filesystem block size in > > > that syscall. Hence I would much prefer that we extend it with the > > > DIO properties we need to support rather than "create" a new VFS > > > ioctl to extract this information. We already have statx(), so let's > > > use it for what it was intended for. Eh, ok. Let's do that instead. > > > > > > > I assumed that XFS_IOC_DIOINFO *was* per-file. XFS's *implementation* of it > > looks at the filesystem only, > > You've got that wrong. > > case XFS_IOC_DIOINFO: { > >>>>>> struct xfs_buftarg *target = xfs_inode_buftarg(ip); > struct dioattr da; > > da.d_mem = da.d_miniosz = target->bt_logical_sectorsize; > > xfs_inode_buftarg() is determining which block device the inode is > storing it's data on, so the returned dioattr values can be > different for different inodes in the filesystem... > > It's always been that way since the early Irix days - XFS RT devices > could have very different IO constraints than the data device and > DIO had to conform to the hardware limits underlying the filesystem. > Hence the dioattr information has -always- been per-inode > information. > > > (Per-file state is required for encrypted > > files. It's also required for other filesystem features; e.g., files that use > > compression or fs-verity don't support direct I/O at all.) > > Which is exactly why is should be a property of statx(), rather than > try to re-use a ~30 year old filesystem specific API from a > different OS that was never intended to indicate things like "DIO > not supported on this file at all".... Heh. You mean like ALLOCSP? Ok ok point taken. > We've been bitten many times by this "lift a rarely used filesystem > specific ioctl to the VFS because it exists" method of API > promotion. It almost always ends up in us discovering further down > the track that there's something wrong with the API, it doesn't > quite do what we need, we have to extend it anyway, or it's just > plain borken, etc. And then we have to create a new, fit for purpose > API anyway, and there's two VFS APIs we have to maintain forever > instead of just one... > > Can we learn from past mistakes this time instead of repeating them > yet again? Sure. How's this? I couldn't think of a real case of directio requiring different alignments for pos and bytecount, so the only real addition here is the alignment requirements for best performance. struct statx { ... /* 0x90 */ __u64 stx_mnt_id; /* Memory buffer alignment required for directio, in bytes. */ __u32 stx_dio_mem_align; /* File range alignment required for directio, in bytes. */ __u32 stx_dio_fpos_align_min; /* 0xa0 */ /* File range alignment needed for best performance, in bytes. */ __u32 stx_dio_fpos_align_opt; /* Maximum size of a directio request, in bytes. */ __u32 stx_dio_max_iosize; __u64 __spare3[11]; /* Spare space for future expansion */ /* 0x100 */ }; Along with: #define STATX_DIRECTIO 0x00001000U /* Want/got directio geometry */ How about that? --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Thu, Jan 20, 2022 at 06:36:03PM -0800, Darrick J. Wong wrote: > Sure. How's this? I couldn't think of a real case of directio > requiring different alignments for pos and bytecount, so the only real > addition here is the alignment requirements for best performance. While I see some benefits of adding the information to a catchall like statx we really need to be careful to not bloat the structure like crazy. > struct statx { > ... > /* 0x90 */ > __u64 stx_mnt_id; > > /* Memory buffer alignment required for directio, in bytes. */ > __u32 stx_dio_mem_align; > > /* File range alignment required for directio, in bytes. */ > __u32 stx_dio_fpos_align_min; So this really needs a good explanation why we need both iven that we had no real use case for this. > /* File range alignment needed for best performance, in bytes. */ > __u32 stx_dio_fpos_align_opt; And why we really care about this. I guess you want to allow sector size dio in reflink setups, but discourage it. But is this really as important? > /* Maximum size of a directio request, in bytes. */ > __u32 stx_dio_max_iosize; I know XFS_IOC_DIOINFO had this, but does it really make much sense? Why do we need it for direct I/O and not buffered I/O?
On Thu, Jan 20, 2022 at 06:36:03PM -0800, Darrick J. Wong wrote: > On Fri, Jan 21, 2022 at 10:57:55AM +1100, Dave Chinner wrote: > Sure. How's this? I couldn't think of a real case of directio > requiring different alignments for pos and bytecount, so the only real > addition here is the alignment requirements for best performance. > > struct statx { > ... > /* 0x90 */ > __u64 stx_mnt_id; > > /* Memory buffer alignment required for directio, in bytes. */ > __u32 stx_dio_mem_align; __32 stx_mem_align_dio; (for consistency with suggestions below) > > /* File range alignment required for directio, in bytes. */ > __u32 stx_dio_fpos_align_min; "fpos" is not really a user term - "offset" is the userspace term for file position, and it's much less of a random letter salad if it's named that way. Also, we don't need "min" in the name; the description of the field in the man page can give all the gory details about it being the minimum required alignment. __u32 stx_offset_align_dio; > > /* 0xa0 */ > > /* File range alignment needed for best performance, in bytes. */ > __u32 stx_dio_fpos_align_opt; This is a common property of both DIO and buffered IO, so no need for it to be dio-only property. __u32 stx_offset_align_optimal; > > /* Maximum size of a directio request, in bytes. */ > __u32 stx_dio_max_iosize; Unnecessary, it will always be the syscall max IO size, because the internal DIO code will slice and dice it down to the max sizes the hardware supports. > #define STATX_DIRECTIO 0x00001000U /* Want/got directio geometry */ > > How about that? Mostly seems reasonable at a first look. Cheers, Dave.
On Mon, Jan 24, 2022 at 10:03:32AM +1100, Dave Chinner wrote: > > > > /* 0xa0 */ > > > > /* File range alignment needed for best performance, in bytes. */ > > __u32 stx_dio_fpos_align_opt; > > This is a common property of both DIO and buffered IO, so no need > for it to be dio-only property. > > __u32 stx_offset_align_optimal; > Looking at this more closely: will stx_offset_align_optimal actually be useful, given that st[x]_blksize already exists? From the stat(2) and statx(2) man pages: st_blksize This field gives the "preferred" block size for efficient filesystem I/O. stx_blksize The "preferred" block size for efficient filesystem I/O. (Writ‐ ing to a file in smaller chunks may cause an inefficient read- modify-rewrite.) File offsets aren't explicitly mentioned, but I think it's implied they should be a multiple of st[x]_blksize, just like the I/O size. Otherwise, the I/O would obviously require reading/writing partial blocks. So, the proposed stx_offset_align_optimal field sounds like the same thing to me. Is there anything I'm misunderstanding? Putting stx_offset_align_optimal behind the STATX_DIRECTIO flag would also be confusing if it would apply to both direct and buffered I/O. - Eric
On Tue, Feb 08, 2022 at 05:10:03PM -0800, Eric Biggers wrote: > On Mon, Jan 24, 2022 at 10:03:32AM +1100, Dave Chinner wrote: > > > > > > /* 0xa0 */ > > > > > > /* File range alignment needed for best performance, in bytes. */ > > > __u32 stx_dio_fpos_align_opt; > > > > This is a common property of both DIO and buffered IO, so no need > > for it to be dio-only property. > > > > __u32 stx_offset_align_optimal; > > > > Looking at this more closely: will stx_offset_align_optimal actually be useful, > given that st[x]_blksize already exists? Yes, because.... > From the stat(2) and statx(2) man pages: > > st_blksize > This field gives the "preferred" block size for efficient > filesystem I/O. > > stx_blksize > The "preferred" block size for efficient filesystem I/O. (Writ‐ > ing to a file in smaller chunks may cause an inefficient read- > modify-rewrite.) ... historically speaking, this is intended to avoid RMW cycles for sub-block and/or sub-PAGE_SIZE write() IOs. i.e. the practical definition of st_blksize is the *minimum* IO size the needed to avoid page cache RMW cycles. However, XFS has a "-o largeio" mount option, that sets this value to internal optimal filesytsem alignment values such as stripe unit or even stripe width (-o largeio,swalloc). THis means it can be up to 2GB (maybe larger?) in size. THe problem with this is that many applications are not prepared to see a value of, say, 16MB in st_blksize rather than 4096 bytes. An example of such problems are applications sizing their IO buffers as a multiple of st_blksize - we've had applications fail because they try to use multi-GB sized IO buffers as a result of setting st_blksize to the filesystem/storage idea of optimal IO size rather than PAGE_SIZE. Hence, we can't really change the value of st_blksize without risking random breakage in userspace. hence the practical definition of st_blksize is the *minimum* IO size that avoids RMW cycles for an individual write() syscall, not the most efficient IO size. > File offsets aren't explicitly mentioned, but I think it's implied they should > be a multiple of st[x]_blksize, just like the I/O size. Otherwise, the I/O > would obviously require reading/writing partial blocks. Of course it implies aligned file offsets - block aligned IO is absolutely necessary for effcient filesystem IO. It has for pretty much the entire of unix history... > So, the proposed stx_offset_align_optimal field sounds like the same thing to > me. Is there anything I'm misunderstanding? > > Putting stx_offset_align_optimal behind the STATX_DIRECTIO flag would also be > confusing if it would apply to both direct and buffered I/O. So just name the flag STATX_IOALIGN so that it can cover generic, buffered specific and DIO specific parameters in one hit. Simple, yes? Cheers, Dave.