Message ID | 7c12819-9b94-d56-ff88-35623aa34180@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tmpfs: user xattrs and direct IO | expand |
On Tue 08-08-23 21:34:54, Hugh Dickins wrote: > Depending upon your philosophical viewpoint, either tmpfs always does > direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to > stand in for a more sophisticated filesystem, it can be helpful for tmpfs > to support O_DIRECT. So, give tmpfs a shmem_direct_IO() method, of the > simplest kind: by just returning 0 done, it leaves all the work to the > buffered fallback (and everything else just happens to work out okay - > in particular, its dirty pages don't get lost to invalidation). > > xfstests auto generic which were not run on tmpfs before but now pass: > 036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263 > 323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729 > with no new failures. > > LTP dio tests which were not run on tmpfs before but now pass: > dio01 through dio30, except for dio04 and dio10, which fail because > tmpfs dio read and write allow odd count: tmpfs could be made stricter, > but would that be an improvement? > > Signed-off-by: Hugh Dickins <hughd@google.com> Yeah, we are not quite consistent about whether it is better to silently fallback to buffered IO or return error among filesystems. So I guess whatever you like. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/shmem.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mm/shmem.c b/mm/shmem.c > index 7420b510a9f3..4d5599e566df 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -2720,6 +2720,16 @@ shmem_write_end(struct file *file, struct address_space *mapping, > return copied; > } > > +static ssize_t shmem_direct_IO(struct kiocb *iocb, struct iov_iter *iter) > +{ > + /* > + * Just leave all the work to the buffered fallback. > + * Some LTP tests may expect us to enforce alignment restrictions, > + * but the fallback works just fine with any alignment, so allow it. > + */ > + return 0; > +} > + > static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > { > struct file *file = iocb->ki_filp; > @@ -4421,6 +4431,7 @@ const struct address_space_operations shmem_aops = { > #ifdef CONFIG_TMPFS > .write_begin = shmem_write_begin, > .write_end = shmem_write_end, > + .direct_IO = shmem_direct_IO, > #endif > #ifdef CONFIG_MIGRATION > .migrate_folio = migrate_folio, > -- > 2.35.3 >
Please do not add a new ->direct_IO method. I'm currently working hard on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in your read_iter/write_iter methods. But if we just start claiming direct I/O support for file systems that don't actually support it, I'm starting to seriously wonder why we bother with the flag at all and don't just allow O_DIRECT opens to always succeed..
On Wed, Aug 09, 2023 at 06:41:17AM -0700, Christoph Hellwig wrote: > Please do not add a new ->direct_IO method. I'm currently working hard > on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in > your read_iter/write_iter methods. > > But if we just start claiming direct I/O support for file systems that > don't actually support it, I'm starting to seriously wonder why we > bother with the flag at all and don't just allow O_DIRECT opens > to always succeed.. I see it differently -- you can do byte-aligned directio to S_DAX files on persistent memory, so I don't see why you can't do that for tmpfs files too. (I'm not advocating for letting *disk* based filesystems allow O_DIRECT even if read and writes are always going to go through the page cache and get flushed to disk. If programs wanted that, they'd use O_SYNC.) /mnt is a pmem filesystem, /mnt/on/file has S_DAX set, and /mnt/off/file does not: # xfs_io -c statx /mnt/{on,off}/file fd.path = "/mnt/on/file" fd.flags = non-sync,non-direct,read-write stat.ino = 132 stat.type = regular file stat.size = 1048576 stat.blocks = 2048 fsxattr.xflags = 0x8002 [-p------------x--] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.cowextsize = 0 fsxattr.nextents = 1 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 fd.path = "/mnt/off/file" fd.flags = non-sync,non-direct,read-write stat.ino = 8388737 stat.type = regular file stat.size = 1048576 stat.blocks = 2048 fsxattr.xflags = 0x2 [-p---------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.cowextsize = 0 fsxattr.nextents = 1 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 And now we try a byte-aligned direct write: # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/off/file pwrite: Invalid argument # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/on/file wrote 1/1 bytes at offset 47 1.000000 bytes, 1 ops; 0.0001 sec (5.194 KiB/sec and 5319.1489 ops/sec) --D
On Wed, 9 Aug 2023, Christoph Hellwig wrote: > Please do not add a new ->direct_IO method. I'm currently working hard > on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in > your read_iter/write_iter methods. Thanks for the input, I'd missed that FMODE_CAN_ODIRECT development. I can see why you would surely prefer not to have a .direct_IO added. But whether that's right for tmpfs at this time, I'll let you and all decide: I've tried and tested the v2 patch now, and will send it out shortly; but it has to add a shmem_file_write_iter(), where shmem was doing fine with generic_file_write_iter() + direct_IO() stub before. So my own feeling is that the v1 patch with shmem_direct_IO() was better, duplicating less code; but whatever, you can all decide between them. > > But if we just start claiming direct I/O support for file systems that > don't actually support it, I'm starting to seriously wonder why we > bother with the flag at all and don't just allow O_DIRECT opens > to always succeed.. Yes, I've wondered that way too, but don't have a strong opinion on it. Hugh
On Thu, 10 Aug 2023, Darrick J. Wong wrote: > On Wed, Aug 09, 2023 at 06:41:17AM -0700, Christoph Hellwig wrote: > > Please do not add a new ->direct_IO method. I'm currently working hard > > on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in > > your read_iter/write_iter methods. > > > > But if we just start claiming direct I/O support for file systems that > > don't actually support it, I'm starting to seriously wonder why we > > bother with the flag at all and don't just allow O_DIRECT opens > > to always succeed.. > > I see it differently -- you can do byte-aligned directio to S_DAX files > on persistent memory, so I don't see why you can't do that for tmpfs > files too. Helpful support, thanks. But I didn't read Christoph as unhappy with the granularity issue: just giving me directIOn to FMODE_CAN_ODIRECT, and rightly wondering why we ever fail O_DIRECTs. Hugh > > (I'm not advocating for letting *disk* based filesystems allow O_DIRECT > even if read and writes are always going to go through the page cache > and get flushed to disk. If programs wanted that, they'd use O_SYNC.) > > /mnt is a pmem filesystem, /mnt/on/file has S_DAX set, and /mnt/off/file > does not: > > # xfs_io -c statx /mnt/{on,off}/file > fd.path = "/mnt/on/file" > fd.flags = non-sync,non-direct,read-write > stat.ino = 132 > stat.type = regular file > stat.size = 1048576 > stat.blocks = 2048 > fsxattr.xflags = 0x8002 [-p------------x--] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.cowextsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > fd.path = "/mnt/off/file" > fd.flags = non-sync,non-direct,read-write > stat.ino = 8388737 > stat.type = regular file > stat.size = 1048576 > stat.blocks = 2048 > fsxattr.xflags = 0x2 [-p---------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.cowextsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > > And now we try a byte-aligned direct write: > > # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/off/file > pwrite: Invalid argument > # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/on/file > wrote 1/1 bytes at offset 47 > 1.000000 bytes, 1 ops; 0.0001 sec (5.194 KiB/sec and 5319.1489 ops/sec) > > --D
On Thu, Aug 10, 2023 at 11:16:20PM -0700, Hugh Dickins wrote: > Helpful support, thanks. But I didn't read Christoph as unhappy with > the granularity issue: just giving me directIOn to FMODE_CAN_ODIRECT, > and rightly wondering why we ever fail O_DIRECTs. Exactly.
diff --git a/mm/shmem.c b/mm/shmem.c index 7420b510a9f3..4d5599e566df 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2720,6 +2720,16 @@ shmem_write_end(struct file *file, struct address_space *mapping, return copied; } +static ssize_t shmem_direct_IO(struct kiocb *iocb, struct iov_iter *iter) +{ + /* + * Just leave all the work to the buffered fallback. + * Some LTP tests may expect us to enforce alignment restrictions, + * but the fallback works just fine with any alignment, so allow it. + */ + return 0; +} + static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; @@ -4421,6 +4431,7 @@ const struct address_space_operations shmem_aops = { #ifdef CONFIG_TMPFS .write_begin = shmem_write_begin, .write_end = shmem_write_end, + .direct_IO = shmem_direct_IO, #endif #ifdef CONFIG_MIGRATION .migrate_folio = migrate_folio,
Depending upon your philosophical viewpoint, either tmpfs always does direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to stand in for a more sophisticated filesystem, it can be helpful for tmpfs to support O_DIRECT. So, give tmpfs a shmem_direct_IO() method, of the simplest kind: by just returning 0 done, it leaves all the work to the buffered fallback (and everything else just happens to work out okay - in particular, its dirty pages don't get lost to invalidation). xfstests auto generic which were not run on tmpfs before but now pass: 036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263 323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729 with no new failures. LTP dio tests which were not run on tmpfs before but now pass: dio01 through dio30, except for dio04 and dio10, which fail because tmpfs dio read and write allow odd count: tmpfs could be made stricter, but would that be an improvement? Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/shmem.c | 11 +++++++++++ 1 file changed, 11 insertions(+)