diff mbox series

[vfs.tmpfs,4/5] tmpfs: trivial support for direct IO

Message ID 7c12819-9b94-d56-ff88-35623aa34180@google.com (mailing list archive)
State New, archived
Headers show
Series tmpfs: user xattrs and direct IO | expand

Commit Message

Hugh Dickins Aug. 9, 2023, 4:34 a.m. UTC
Depending upon your philosophical viewpoint, either tmpfs always does
direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
stand in for a more sophisticated filesystem, it can be helpful for tmpfs
to support O_DIRECT.  So, give tmpfs a shmem_direct_IO() method, of the
simplest kind: by just returning 0 done, it leaves all the work to the
buffered fallback (and everything else just happens to work out okay -
in particular, its dirty pages don't get lost to invalidation).

xfstests auto generic which were not run on tmpfs before but now pass:
036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263
323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729
with no new failures.

LTP dio tests which were not run on tmpfs before but now pass:
dio01 through dio30, except for dio04 and dio10, which fail because
tmpfs dio read and write allow odd count: tmpfs could be made stricter,
but would that be an improvement?

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/shmem.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Jan Kara Aug. 9, 2023, 9:54 a.m. UTC | #1
On Tue 08-08-23 21:34:54, Hugh Dickins wrote:
> Depending upon your philosophical viewpoint, either tmpfs always does
> direct IO, or it cannot ever do direct IO; but whichever, if tmpfs is to
> stand in for a more sophisticated filesystem, it can be helpful for tmpfs
> to support O_DIRECT.  So, give tmpfs a shmem_direct_IO() method, of the
> simplest kind: by just returning 0 done, it leaves all the work to the
> buffered fallback (and everything else just happens to work out okay -
> in particular, its dirty pages don't get lost to invalidation).
> 
> xfstests auto generic which were not run on tmpfs before but now pass:
> 036 091 113 125 130 133 135 198 207 208 209 210 211 212 214 226 239 263
> 323 355 391 406 412 422 427 446 451 465 551 586 591 609 615 647 708 729
> with no new failures.
> 
> LTP dio tests which were not run on tmpfs before but now pass:
> dio01 through dio30, except for dio04 and dio10, which fail because
> tmpfs dio read and write allow odd count: tmpfs could be made stricter,
> but would that be an improvement?
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Yeah, we are not quite consistent about whether it is better to silently
fallback to buffered IO or return error among filesystems. So I guess
whatever you like. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/shmem.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 7420b510a9f3..4d5599e566df 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2720,6 +2720,16 @@ shmem_write_end(struct file *file, struct address_space *mapping,
>  	return copied;
>  }
>  
> +static ssize_t shmem_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> +{
> +	/*
> +	 * Just leave all the work to the buffered fallback.
> +	 * Some LTP tests may expect us to enforce alignment restrictions,
> +	 * but the fallback works just fine with any alignment, so allow it.
> +	 */
> +	return 0;
> +}
> +
>  static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  {
>  	struct file *file = iocb->ki_filp;
> @@ -4421,6 +4431,7 @@ const struct address_space_operations shmem_aops = {
>  #ifdef CONFIG_TMPFS
>  	.write_begin	= shmem_write_begin,
>  	.write_end	= shmem_write_end,
> +	.direct_IO	= shmem_direct_IO,
>  #endif
>  #ifdef CONFIG_MIGRATION
>  	.migrate_folio	= migrate_folio,
> -- 
> 2.35.3
>
Christoph Hellwig Aug. 9, 2023, 1:41 p.m. UTC | #2
Please do not add a new ->direct_IO method.  I'm currently working hard
on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
your read_iter/write_iter methods.

But if we just start claiming direct I/O support for file systems that
don't actually support it, I'm starting to seriously wonder why we
bother with the flag at all and don't just allow O_DIRECT opens
to always succeed..
Darrick J. Wong Aug. 10, 2023, 11:41 p.m. UTC | #3
On Wed, Aug 09, 2023 at 06:41:17AM -0700, Christoph Hellwig wrote:
> Please do not add a new ->direct_IO method.  I'm currently working hard
> on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
> your read_iter/write_iter methods.
> 
> But if we just start claiming direct I/O support for file systems that
> don't actually support it, I'm starting to seriously wonder why we
> bother with the flag at all and don't just allow O_DIRECT opens
> to always succeed..

I see it differently -- you can do byte-aligned directio to S_DAX files
on persistent memory, so I don't see why you can't do that for tmpfs
files too.

(I'm not advocating for letting *disk* based filesystems allow O_DIRECT
even if read and writes are always going to go through the page cache
and get flushed to disk.  If programs wanted that, they'd use O_SYNC.)

/mnt is a pmem filesystem, /mnt/on/file has S_DAX set, and /mnt/off/file
does not:

# xfs_io -c statx /mnt/{on,off}/file
fd.path = "/mnt/on/file"
fd.flags = non-sync,non-direct,read-write
stat.ino = 132
stat.type = regular file
stat.size = 1048576
stat.blocks = 2048
fsxattr.xflags = 0x8002 [-p------------x--]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.cowextsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
fd.path = "/mnt/off/file"
fd.flags = non-sync,non-direct,read-write
stat.ino = 8388737
stat.type = regular file
stat.size = 1048576
stat.blocks = 2048
fsxattr.xflags = 0x2 [-p---------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.cowextsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136

And now we try a byte-aligned direct write:

# xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/off/file
pwrite: Invalid argument
# xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/on/file
wrote 1/1 bytes at offset 47
1.000000 bytes, 1 ops; 0.0001 sec (5.194 KiB/sec and 5319.1489 ops/sec)

--D
Hugh Dickins Aug. 11, 2023, 6:08 a.m. UTC | #4
On Wed, 9 Aug 2023, Christoph Hellwig wrote:

> Please do not add a new ->direct_IO method.  I'm currently working hard
> on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
> your read_iter/write_iter methods.

Thanks for the input, I'd missed that FMODE_CAN_ODIRECT development.
I can see why you would surely prefer not to have a .direct_IO added.

But whether that's right for tmpfs at this time, I'll let you and all
decide: I've tried and tested the v2 patch now, and will send it out
shortly; but it has to add a shmem_file_write_iter(), where shmem was
doing fine with generic_file_write_iter() + direct_IO() stub before.

So my own feeling is that the v1 patch with shmem_direct_IO() was better,
duplicating less code; but whatever, you can all decide between them.

> 
> But if we just start claiming direct I/O support for file systems that
> don't actually support it, I'm starting to seriously wonder why we
> bother with the flag at all and don't just allow O_DIRECT opens
> to always succeed..

Yes, I've wondered that way too, but don't have a strong opinion on it.

Hugh
Hugh Dickins Aug. 11, 2023, 6:16 a.m. UTC | #5
On Thu, 10 Aug 2023, Darrick J. Wong wrote:
> On Wed, Aug 09, 2023 at 06:41:17AM -0700, Christoph Hellwig wrote:
> > Please do not add a new ->direct_IO method.  I'm currently working hard
> > on removing it, just set FMODE_CAN_ODIRECT and handle the fallback in
> > your read_iter/write_iter methods.
> > 
> > But if we just start claiming direct I/O support for file systems that
> > don't actually support it, I'm starting to seriously wonder why we
> > bother with the flag at all and don't just allow O_DIRECT opens
> > to always succeed..
> 
> I see it differently -- you can do byte-aligned directio to S_DAX files
> on persistent memory, so I don't see why you can't do that for tmpfs
> files too.

Helpful support, thanks.  But I didn't read Christoph as unhappy with
the granularity issue: just giving me directIOn to FMODE_CAN_ODIRECT,
and rightly wondering why we ever fail O_DIRECTs.

Hugh

> 
> (I'm not advocating for letting *disk* based filesystems allow O_DIRECT
> even if read and writes are always going to go through the page cache
> and get flushed to disk.  If programs wanted that, they'd use O_SYNC.)
> 
> /mnt is a pmem filesystem, /mnt/on/file has S_DAX set, and /mnt/off/file
> does not:
> 
> # xfs_io -c statx /mnt/{on,off}/file
> fd.path = "/mnt/on/file"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 132
> stat.type = regular file
> stat.size = 1048576
> stat.blocks = 2048
> fsxattr.xflags = 0x8002 [-p------------x--]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.cowextsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> fd.path = "/mnt/off/file"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 8388737
> stat.type = regular file
> stat.size = 1048576
> stat.blocks = 2048
> fsxattr.xflags = 0x2 [-p---------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.cowextsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> 
> And now we try a byte-aligned direct write:
> 
> # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/off/file
> pwrite: Invalid argument
> # xfs_io -d -c 'pwrite -S 0x58 47 1' /mnt/on/file
> wrote 1/1 bytes at offset 47
> 1.000000 bytes, 1 ops; 0.0001 sec (5.194 KiB/sec and 5319.1489 ops/sec)
> 
> --D
Christoph Hellwig Aug. 11, 2023, 8:34 a.m. UTC | #6
On Thu, Aug 10, 2023 at 11:16:20PM -0700, Hugh Dickins wrote:
> Helpful support, thanks.  But I didn't read Christoph as unhappy with
> the granularity issue: just giving me directIOn to FMODE_CAN_ODIRECT,
> and rightly wondering why we ever fail O_DIRECTs.

Exactly.
diff mbox series

Patch

diff --git a/mm/shmem.c b/mm/shmem.c
index 7420b510a9f3..4d5599e566df 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2720,6 +2720,16 @@  shmem_write_end(struct file *file, struct address_space *mapping,
 	return copied;
 }
 
+static ssize_t shmem_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+{
+	/*
+	 * Just leave all the work to the buffered fallback.
+	 * Some LTP tests may expect us to enforce alignment restrictions,
+	 * but the fallback works just fine with any alignment, so allow it.
+	 */
+	return 0;
+}
+
 static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
@@ -4421,6 +4431,7 @@  const struct address_space_operations shmem_aops = {
 #ifdef CONFIG_TMPFS
 	.write_begin	= shmem_write_begin,
 	.write_end	= shmem_write_end,
+	.direct_IO	= shmem_direct_IO,
 #endif
 #ifdef CONFIG_MIGRATION
 	.migrate_folio	= migrate_folio,