Message ID | 167391048988.2311931.1567396746365286847.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iov_iter: Improve page extraction (ref, pin or just list) | expand |
On Mon, Jan 16, 2023 at 11:08:09PM +0000, David Howells wrote: > IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write > operation to the VFS, but it isn't set by, say, the write() system call. > > Fix this by setting IOCB_WRITE unconditionally in call_write_iter(). > > This will allow drivers to use IOCB_WRITE instead of the iterator data > source to determine the I/O direction. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Alexander Viro <viro@zeniv.linux.org.uk> > cc: Christoph Hellwig <hch@lst.de> > cc: Jens Axboe <axboe@kernel.dk> > cc: linux-block@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > --- > > include/linux/fs.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 066555ad1bf8..649ff061440e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2183,6 +2183,7 @@ static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, > static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, > struct iov_iter *iter) > { > + kio->ki_flags |= IOCB_WRITE; > return file->f_op->write_iter(kio, iter); > } This doesn't remove the existing setting of IOCB_WRITE, and also feelds like the wrong place. I suspect the best is to: - rename init_sync_kiocb to init_kiocb - pass a new argument for the destination to it. I'm not entirely sure if flags is a good thing, or an explicit READ/WRITE might be better because it's harder to get wrong, even if a the compiler might generate worth code for it. - also use it in the async callers (io_uring, aio, overlayfs, loop, nvmet, target, cachefs, file backed swap)
Christoph Hellwig <hch@infradead.org> wrote: > I suspect the best is to: > > - rename init_sync_kiocb to init_kiocb > - pass a new argument for the destination Do you mean the direction rather than the destination? > to it. I'm not entirely > sure if flags is a good thing, or an explicit READ/WRITE might be > better because it's harder to get wrong, even if a the compiler > might generate worth code for it. > - also use it in the async callers (io_uring, aio, overlayfs, loop, > nvmet, target, cachefs, file backed swap) David
On Tue, Jan 17, 2023 at 08:28:24AM +0000, David Howells wrote: > Christoph Hellwig <hch@infradead.org> wrote: > > > I suspect the best is to: > > > > - rename init_sync_kiocb to init_kiocb > > - pass a new argument for the destination > > Do you mean the direction rather than the destination? Yes.
Christoph Hellwig <hch@infradead.org> wrote: > I suspect the best is to: > > - rename init_sync_kiocb to init_kiocb > - pass a new argument for the direction to it. I'm not entirely > sure if flags is a good thing, or an explicit READ/WRITE might be > better because it's harder to get wrong, even if a the compiler > might generate worth code for it. So something like: init_kiocb(kiocb, file, WRITE); init_kiocb(kiocb, file, READ); David
On Tue, Jan 17, 2023 at 11:11:16AM +0000, David Howells wrote: > So something like: > > init_kiocb(kiocb, file, WRITE); > init_kiocb(kiocb, file, READ); Yes.
On Mon, Jan 16, 2023 at 11:08:09PM +0000, David Howells wrote: > IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write > operation to the VFS, but it isn't set by, say, the write() system call. > > Fix this by setting IOCB_WRITE unconditionally in call_write_iter(). Which does nothing for places that do not use call_write_iter()... __kernel_write_iter() is one such; for less obvious specimen see drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() - there we have iocb coming from the caller and *not* fed to init_sync_kiocb(), so Christoph's suggestion doesn't work either. Sure, we could take care of that by adding ki_flags |= IOCB_WRITE in there, but... FWIW, call chains for ->write_iter() (as an explicit method call) are: ->write_iter() <- __kernel_write_iter() [init_sync_kiocb()] ->write_iter() <- call_write_iter() <- new_sync_write() [init_sync_kiocb()] ->write_iter() <- call_write_iter() <- do_iter_read_write() [init_sync_kiocb()] ->write_iter() <- call_write_iter() <- aio_write() [sets KIOCB_WRITE] ->write_iter() <- call_write_iter() <- io_write() [sets KIOCB_WRITE] ->write_iter() <- nvmet_file_submit_bvec() ->write_iter() <- call_write_iter() <- lo_rw_aio() ->write_iter() <- call_write_iter() <- fd_execute_rw_aio() ->write_iter() <- call_write_iter() <- vfs_iocb_iter_write() The last 4 neither set KIOCB_WRITE nor call init_sync_kiocb(). What's more, there are places that call instances (or their guts - look at btrfs_do_write_iter() callers) directly...
On Mon, Jan 16, 2023 at 11:52:43PM -0800, Christoph Hellwig wrote: > This doesn't remove the existing setting of IOCB_WRITE, and also > feelds like the wrong place. > > I suspect the best is to: > > - rename init_sync_kiocb to init_kiocb > - pass a new argument for the destination to it. I'm not entirely > sure if flags is a good thing, or an explicit READ/WRITE might be > better because it's harder to get wrong, even if a the compiler > might generate worth code for it. > - also use it in the async callers (io_uring, aio, overlayfs, loop, > nvmet, target, cachefs, file backed swap) Do you want it to mess with get_current_ioprio() for those? Looks wrong...
On Wed, Jan 18, 2023 at 10:05:38PM +0000, Al Viro wrote: > __kernel_write_iter() is one such; for less obvious specimen see > drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() - there > we have iocb coming from the caller and *not* fed to init_sync_kiocb(), > so Christoph's suggestion doesn't work either. Sure, we could take > care of that by adding ki_flags |= IOCB_WRITE in there, but... None of the asyc users of iocbs currently uses init_sync_kiocb. My suggestion is to use it everywhere - we have less than a dozen of these that I all listed.
On Wed, Jan 18, 2023 at 10:11:45PM +0000, Al Viro wrote: > On Mon, Jan 16, 2023 at 11:52:43PM -0800, Christoph Hellwig wrote: > > > This doesn't remove the existing setting of IOCB_WRITE, and also > > feelds like the wrong place. > > > > I suspect the best is to: > > > > - rename init_sync_kiocb to init_kiocb > > - pass a new argument for the destination to it. I'm not entirely > > sure if flags is a good thing, or an explicit READ/WRITE might be > > better because it's harder to get wrong, even if a the compiler > > might generate worth code for it. > > - also use it in the async callers (io_uring, aio, overlayfs, loop, > > nvmet, target, cachefs, file backed swap) > > Do you want it to mess with get_current_ioprio() for those? Looks > wrong... We want to be consistent for sync vs async submission. So I think yes, we want to do the get_current_ioprio for most of them, exceptions beeing aio and io_uring - those could use a __init_iocb or init_iocb_ioprio variant that passs in the explicit priority if we want to avoid the call if it would be overriden later.
Al Viro <viro@zeniv.linux.org.uk> wrote: > Which does nothing for places that do not use call_write_iter()... > __kernel_write_iter() is one such; for less obvious specimen see > drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() Should these be calling call_read/write_iter()? If not, should call_read/write_iter() be dropped? David
Al Viro <viro@zeniv.linux.org.uk> wrote: > ->write_iter() <- nvmet_file_submit_bvec() > ->write_iter() <- call_write_iter() <- lo_rw_aio() Could call init_kiocb() in lo_rw_aio() and then just overwrite ki_ioprio. > ->write_iter() <- call_write_iter() <- fd_execute_rw_aio() fd_execute_rw_aio() perhaps should call init_kiocb() since the struct is allocated with kmalloc() and not fully initialised. > ->write_iter() <- call_write_iter() <- vfs_iocb_iter_write() > > The last 4 neither set KIOCB_WRITE nor call init_sync_kiocb(). vfs_iocb_iter_write() is given an initialised kiocb. It should not be calling init_sync_kiocb() itself. It's called from two places: cachefiles, which initialises the kiocb itself and sets IOCB_WRITE, and overlayfs, which gets the kiocb from the VFS via its ->write_iter hook the caller of which should have already set IOCB_WRITE. cachefiles should be using init_kiocb() - though since it used kzalloc, init_kiocb() clearing the struct is redundant. > What's more, there are places that call instances (or their guts - look at > btrfs_do_write_iter() callers) directly... At least in the case of btrfs_ioctl_encoded_write(), that can call init_kiocb(). But as you say, there are more to be found. David
Christoph Hellwig <hch@infradead.org> wrote: > We want to be consistent for sync vs async submission. So I think yes, > we want to do the get_current_ioprio for most of them, exceptions > beeing aio and io_uring - those could use a __init_iocb or > init_iocb_ioprio variant that passs in the explicit priority if we want > to avoid the call if it would be overriden later. io_uring is a bit problematic in this regard. io_prep_rw() starts the initialisation of the kiocb, so io_read() and io_write() can't just reinitialise it. OTOH, I'm not sure io_prep_rw() has sufficient information to hand. I wonder if I should add a flag to struct io_op_def to indicate that this is going to be a write operation and maybe add a REQ_F_WRITE flag that gets set by that. David
On Thu, Jan 19, 2023 at 10:01:26AM +0000, David Howells wrote: > Al Viro <viro@zeniv.linux.org.uk> wrote: > > > Which does nothing for places that do not use call_write_iter()... > > __kernel_write_iter() is one such; for less obvious specimen see > > drivers/nvme/target/io-cmd-file.c:nvmet_file_submit_bvec() > > Should these be calling call_read/write_iter()? If not, should > call_read/write_iter() be dropped? I wish they'd just go away, they are a bit of a distraction.
On Thu, Jan 19, 2023 at 11:34:26AM +0000, David Howells wrote: > io_uring is a bit problematic in this regard. io_prep_rw() starts the > initialisation of the kiocb, so io_read() and io_write() can't just > reinitialise it. OTOH, I'm not sure io_prep_rw() has sufficient information > to hand. It could probably be refactored. That being said, I suspect we're better off deferring the whole iov_iter direction cleanup. It's a bit ugly right now, but there is nothing urgent. The gup pin work otoh really is something we need to get down rather sooner than later. So what about deferring this whole cleanup for now?
Christoph Hellwig <hch@infradead.org> wrote:
> So what about deferring this whole cleanup for now?
So you'd rather I stick with the direction indicator in the iov_iter struct
for now?
I still want to add iov_iter_extract_pages(), netfs_extract_user_iter() and
netfs_extract_iter_to_sg(), even if it's only cifs and netfslib that use them
for the moment.
David
diff --git a/include/linux/fs.h b/include/linux/fs.h index 066555ad1bf8..649ff061440e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2183,6 +2183,7 @@ static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { + kio->ki_flags |= IOCB_WRITE; return file->f_op->write_iter(kio, iter); }
IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write operation to the VFS, but it isn't set by, say, the write() system call. Fix this by setting IOCB_WRITE unconditionally in call_write_iter(). This will allow drivers to use IOCB_WRITE instead of the iterator data source to determine the I/O direction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+)