diff mbox series

[v6,01/34] vfs: Unconditionally set IOCB_WRITE in call_write_iter()

Message ID 167391048988.2311931.1567396746365286847.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series iov_iter: Improve page extraction (ref, pin or just list) | expand

Commit Message

David Howells Jan. 16, 2023, 11:08 p.m. UTC
IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write
operation to the VFS, but it isn't set by, say, the write() system call.

Fix this by setting IOCB_WRITE unconditionally in call_write_iter().

This will allow drivers to use IOCB_WRITE instead of the iterator data
source to determine the I/O direction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---

 include/linux/fs.h |    1 +
 1 file changed, 1 insertion(+)

Comments

Christoph Hellwig Jan. 17, 2023, 7:52 a.m. UTC | #1
On Mon, Jan 16, 2023 at 11:08:09PM +0000, David Howells wrote:
> IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write
> operation to the VFS, but it isn't set by, say, the write() system call.
> 
> Fix this by setting IOCB_WRITE unconditionally in call_write_iter().
> 
> This will allow drivers to use IOCB_WRITE instead of the iterator data
> source to determine the I/O direction.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Alexander Viro <viro@zeniv.linux.org.uk>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: linux-block@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> 
>  include/linux/fs.h |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 066555ad1bf8..649ff061440e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2183,6 +2183,7 @@ static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio,
>  static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
>  				      struct iov_iter *iter)
>  {
> +	kio->ki_flags |= IOCB_WRITE;
>  	return file->f_op->write_iter(kio, iter);
>  }

This doesn't remove the existing setting of IOCB_WRITE, and also
feelds like the wrong place.

I suspect the best is to:

 - rename init_sync_kiocb to init_kiocb
 - pass a new argument for the destination to it.  I'm not entirely
   sure if flags is a good thing, or an explicit READ/WRITE might be
   better because it's harder to get wrong, even if a the compiler
   might generate worth code for it.
 - also use it in the async callers (io_uring, aio, overlayfs, loop,
   nvmet, target, cachefs, file backed swap)
David Howells Jan. 17, 2023, 8:28 a.m. UTC | #2
Christoph Hellwig <hch@infradead.org> wrote:

> I suspect the best is to:
> 
>  - rename init_sync_kiocb to init_kiocb
>  - pass a new argument for the destination

Do you mean the direction rather than the destination?

>    to it.  I'm not entirely
>    sure if flags is a good thing, or an explicit READ/WRITE might be
>    better because it's harder to get wrong, even if a the compiler
>    might generate worth code for it.
>  - also use it in the async callers (io_uring, aio, overlayfs, loop,
>    nvmet, target, cachefs, file backed swap)

David
Christoph Hellwig Jan. 17, 2023, 8:44 a.m. UTC | #3
On Tue, Jan 17, 2023 at 08:28:24AM +0000, David Howells wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > I suspect the best is to:
> > 
> >  - rename init_sync_kiocb to init_kiocb
> >  - pass a new argument for the destination
> 
> Do you mean the direction rather than the destination?

Yes.
David Howells Jan. 17, 2023, 11:11 a.m. UTC | #4
Christoph Hellwig <hch@infradead.org> wrote:

> I suspect the best is to:
> 
>  - rename init_sync_kiocb to init_kiocb
>  - pass a new argument for the direction to it.  I'm not entirely
>    sure if flags is a good thing, or an explicit READ/WRITE might be
>    better because it's harder to get wrong, even if a the compiler
>    might generate worth code for it.

So something like:

	init_kiocb(kiocb, file, WRITE);
	init_kiocb(kiocb, file, READ);

David
Christoph Hellwig Jan. 17, 2023, 11:11 a.m. UTC | #5
On Tue, Jan 17, 2023 at 11:11:16AM +0000, David Howells wrote:
> So something like:
> 
> 	init_kiocb(kiocb, file, WRITE);
> 	init_kiocb(kiocb, file, READ);

Yes.
Al Viro Jan. 18, 2023, 10:05 p.m. UTC | #6
On Mon, Jan 16, 2023 at 11:08:09PM +0000, David Howells wrote:
> IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write
> operation to the VFS, but it isn't set by, say, the write() system call.
> 
> Fix this by setting IOCB_WRITE unconditionally in call_write_iter().

	Which does nothing for places that do not use call_write_iter()...
__kernel_write_iter() is one such; for less obvious specimen see
drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() - there
we have iocb coming from the caller and *not* fed to init_sync_kiocb(),
so Christoph's suggestion doesn't work either.  Sure, we could take
care of that by adding ki_flags |= IOCB_WRITE in there, but...

FWIW, call chains for ->write_iter() (as an explicit method call) are:

->write_iter() <- __kernel_write_iter() [init_sync_kiocb()]
->write_iter() <- call_write_iter() <- new_sync_write() [init_sync_kiocb()]
->write_iter() <- call_write_iter() <- do_iter_read_write() [init_sync_kiocb()]
->write_iter() <- call_write_iter() <- aio_write() [sets KIOCB_WRITE]
->write_iter() <- call_write_iter() <- io_write() [sets KIOCB_WRITE]

->write_iter() <- nvmet_file_submit_bvec()
->write_iter() <- call_write_iter() <- lo_rw_aio()
->write_iter() <- call_write_iter() <- fd_execute_rw_aio()
->write_iter() <- call_write_iter() <- vfs_iocb_iter_write()

The last 4 neither set KIOCB_WRITE nor call init_sync_kiocb().  What's
more, there are places that call instances (or their guts - look at
btrfs_do_write_iter() callers) directly...
Al Viro Jan. 18, 2023, 10:11 p.m. UTC | #7
On Mon, Jan 16, 2023 at 11:52:43PM -0800, Christoph Hellwig wrote:

> This doesn't remove the existing setting of IOCB_WRITE, and also
> feelds like the wrong place.
> 
> I suspect the best is to:
> 
>  - rename init_sync_kiocb to init_kiocb
>  - pass a new argument for the destination to it.  I'm not entirely
>    sure if flags is a good thing, or an explicit READ/WRITE might be
>    better because it's harder to get wrong, even if a the compiler
>    might generate worth code for it.
>  - also use it in the async callers (io_uring, aio, overlayfs, loop,
>    nvmet, target, cachefs, file backed swap)

Do you want it to mess with get_current_ioprio() for those?  Looks
wrong...
Christoph Hellwig Jan. 19, 2023, 5:41 a.m. UTC | #8
On Wed, Jan 18, 2023 at 10:05:38PM +0000, Al Viro wrote:
> __kernel_write_iter() is one such; for less obvious specimen see
> drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() - there
> we have iocb coming from the caller and *not* fed to init_sync_kiocb(),
> so Christoph's suggestion doesn't work either.  Sure, we could take
> care of that by adding ki_flags |= IOCB_WRITE in there, but...

None of the asyc users of iocbs currently uses init_sync_kiocb.  My
suggestion is to use it everywhere - we have less than a dozen of
these that I all listed.
Christoph Hellwig Jan. 19, 2023, 5:44 a.m. UTC | #9
On Wed, Jan 18, 2023 at 10:11:45PM +0000, Al Viro wrote:
> On Mon, Jan 16, 2023 at 11:52:43PM -0800, Christoph Hellwig wrote:
> 
> > This doesn't remove the existing setting of IOCB_WRITE, and also
> > feelds like the wrong place.
> > 
> > I suspect the best is to:
> > 
> >  - rename init_sync_kiocb to init_kiocb
> >  - pass a new argument for the destination to it.  I'm not entirely
> >    sure if flags is a good thing, or an explicit READ/WRITE might be
> >    better because it's harder to get wrong, even if a the compiler
> >    might generate worth code for it.
> >  - also use it in the async callers (io_uring, aio, overlayfs, loop,
> >    nvmet, target, cachefs, file backed swap)
> 
> Do you want it to mess with get_current_ioprio() for those?  Looks
> wrong...

We want to be consistent for sync vs async submission.  So I think yes,
we want to do the get_current_ioprio for most of them, exceptions
beeing aio and io_uring - those could use a __init_iocb or
init_iocb_ioprio variant that passs in the explicit priority if we want
to avoid the call if it would be overriden later.
David Howells Jan. 19, 2023, 10:01 a.m. UTC | #10
Al Viro <viro@zeniv.linux.org.uk> wrote:

> 	Which does nothing for places that do not use call_write_iter()...
> __kernel_write_iter() is one such; for less obvious specimen see
> drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec()

Should these be calling call_read/write_iter()?  If not, should
call_read/write_iter() be dropped?

David
David Howells Jan. 19, 2023, 11:06 a.m. UTC | #11
Al Viro <viro@zeniv.linux.org.uk> wrote:

> ->write_iter() <- nvmet_file_submit_bvec()
> ->write_iter() <- call_write_iter() <- lo_rw_aio()

Could call init_kiocb() in lo_rw_aio() and then just overwrite ki_ioprio.

> ->write_iter() <- call_write_iter() <- fd_execute_rw_aio()

fd_execute_rw_aio() perhaps should call init_kiocb() since the struct is
allocated with kmalloc() and not fully initialised.

> ->write_iter() <- call_write_iter() <- vfs_iocb_iter_write()
>
> The last 4 neither set KIOCB_WRITE nor call init_sync_kiocb().

vfs_iocb_iter_write() is given an initialised kiocb.  It should not be calling
init_sync_kiocb() itself.

It's called from two places: cachefiles, which initialises the kiocb itself
and sets IOCB_WRITE, and overlayfs, which gets the kiocb from the VFS via its
->write_iter hook the caller of which should have already set IOCB_WRITE.

cachefiles should be using init_kiocb() - though since it used kzalloc,
init_kiocb() clearing the struct is redundant.

> What's more, there are places that call instances (or their guts - look at
> btrfs_do_write_iter() callers) directly...

At least in the case of btrfs_ioctl_encoded_write(), that can call
init_kiocb().  But as you say, there are more to be found.

David
David Howells Jan. 19, 2023, 11:34 a.m. UTC | #12
Christoph Hellwig <hch@infradead.org> wrote:

> We want to be consistent for sync vs async submission.  So I think yes,
> we want to do the get_current_ioprio for most of them, exceptions
> beeing aio and io_uring - those could use a __init_iocb or
> init_iocb_ioprio variant that passs in the explicit priority if we want
> to avoid the call if it would be overriden later.

io_uring is a bit problematic in this regard.  io_prep_rw() starts the
initialisation of the kiocb, so io_read() and io_write() can't just
reinitialise it.  OTOH, I'm not sure io_prep_rw() has sufficient information
to hand.

I wonder if I should add a flag to struct io_op_def to indicate that this is
going to be a write operation and maybe add a REQ_F_WRITE flag that gets set
by that.

David
Christoph Hellwig Jan. 19, 2023, 4:46 p.m. UTC | #13
On Thu, Jan 19, 2023 at 10:01:26AM +0000, David Howells wrote:
> Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> > 	Which does nothing for places that do not use call_write_iter()...
> > __kernel_write_iter() is one such; for less obvious specimen see
> > drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec()
> 
> Should these be calling call_read/write_iter()?  If not, should
> call_read/write_iter() be dropped?

I wish they'd just go away, they are a bit of a distraction.
Christoph Hellwig Jan. 19, 2023, 4:48 p.m. UTC | #14
On Thu, Jan 19, 2023 at 11:34:26AM +0000, David Howells wrote:
> io_uring is a bit problematic in this regard.  io_prep_rw() starts the
> initialisation of the kiocb, so io_read() and io_write() can't just
> reinitialise it.  OTOH, I'm not sure io_prep_rw() has sufficient information
> to hand.

It could probably be refactored.  That being said, I suspect we're
better off deferring the whole iov_iter direction cleanup. It's a bit
ugly right now, but there is nothing urgent.  The gup pin work otoh
really is something we need to get down rather sooner than later.

So what about deferring this whole cleanup for now?
David Howells Jan. 19, 2023, 9:14 p.m. UTC | #15
Christoph Hellwig <hch@infradead.org> wrote:

> So what about deferring this whole cleanup for now?

So you'd rather I stick with the direction indicator in the iov_iter struct
for now?

I still want to add iov_iter_extract_pages(), netfs_extract_user_iter() and
netfs_extract_iter_to_sg(), even if it's only cifs and netfslib that use them
for the moment.

David
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 066555ad1bf8..649ff061440e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2183,6 +2183,7 @@  static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio,
 static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
 				      struct iov_iter *iter)
 {
+	kio->ki_flags |= IOCB_WRITE;
 	return file->f_op->write_iter(kio, iter);
 }