Message ID | 20200221143723.482323-1-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | zonefs: fix IOCB_NOWAIT handling | expand |
Hi Christoph, On 2020/02/21 23:37, Christoph Hellwig wrote: > IOCB_NOWAIT can't just be ignored as it breaks applications expecting > it not to block. Just refuse the operation as applications must handle > that (e.g. by falling back to a thread pool). > > Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/zonefs/super.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c > index 8bc6ef82d693..69aee3dfb660 100644 > --- a/fs/zonefs/super.c > +++ b/fs/zonefs/super.c > @@ -601,13 +601,13 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) > ssize_t ret; > > /* > - * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT > + * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT > * as this can cause write reordering (e.g. the first aio gets EAGAIN > * on the inode lock but the second goes through but is now unaligned). > */ > - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) > - && (iocb->ki_flags & IOCB_NOWAIT)) > - iocb->ki_flags &= ~IOCB_NOWAIT; > + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) && > + (iocb->ki_flags & IOCB_NOWAIT)) > + return -EOPNOTSUPP; > > if (iocb->ki_flags & IOCB_NOWAIT) { > if (!inode_trylock(inode)) > The main problem with allowing IOCB_NOWAIT is that for an application that sends multiple write AIOs with a single io_submit(), all write AIOs after one that is failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will have an unaligned write offset. I am wondering if that is really a problem at all since the application can catch the -EAGAIN and -EINVAL, indicating that the write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ? Dave did not think it is. I was split on this. Would be happy to hear your opinion. Another solution I was thinking of is something like the hack below, which may be cleaner and easier for the application to handle as that reduces the number of errors for a multi-io io_submit() call with RWF_NOWAIT. To do so, this introduces the IOCB_WRITE_FAIL_FAST kiocb flag that is set in zonefs_file_dio_write() for a nowait kiocb to a sequential zone file. If this function fails (with -EAGAIN or any other error), this flag instruct aio_write() to return the error for the failed kiocb instead of 0. This error return of aio_write() then stops the io_submit() loop on the first failed iocb. diff --git a/fs/aio.c b/fs/aio.c index 5f3d3d814928..9f01541c8ecc 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1568,6 +1568,8 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb, return ret; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) { + ssize_t err; + /* * Open-code file_start_write here to grab freeze protection, * which will be released by another thread in @@ -1580,7 +1582,12 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb, __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE); } req->ki_flags |= IOCB_WRITE; - aio_rw_done(req, call_write_iter(file, req, &iter)); + err = call_write_iter(file, req, &iter); + if (err != -EIOCBQUEUED) { + aio_rw_done(req, err); + if ((req->ki_flags & IOCB_WRITE_FAIL_FAST) && err < 0) + ret = err; + } } kfree(iovec); return ret; diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 8bc6ef82d693..0fa098354e38 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -601,13 +601,14 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) ssize_t ret; /* - * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT - * as this can cause write reordering (e.g. the first aio gets EAGAIN - * on the inode lock but the second goes through but is now unaligned). + * For async direct IOs to sequential zone files, IOCB_NOWAIT can cause + * unaligned writes in case of EAGAIN error or any failure to issue the + * direct IO. So tell vfs to stop io_submit() on the first failure to + * avoid more failed aios than necessary. */ if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) && (iocb->ki_flags & IOCB_NOWAIT)) - iocb->ki_flags &= ~IOCB_NOWAIT; + iocb->ki_flags |= IOCB_WRITE_FAIL_FAST; if (iocb->ki_flags & IOCB_NOWAIT) { if (!inode_trylock(inode)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 3cd4fe6b845e..8e7df1cc92e4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -314,6 +314,7 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT (1 << 7) +#define IOCB_WRITE_FAIL_FAST (1 << 8) struct kiocb { struct file *ki_filp; Of note is that checking the code path for iomap_direct_io(), I noticed that there are at least 2 places places where the code can block: dio allocation with GFP_KERNEL and the BIOs allocations for that dio also with GFP_KERNEL. While the latter may be necessary to avoid partial failures of large direct IOs that need to be split into multiple BIOs, shouldn't the dio allocation be done with GFP_NOWAIT when IOMAP_NOWAIT is set ? Best regards.
On Mon, Feb 24, 2020 at 01:48:48PM +0000, Damien Le Moal wrote: > > The main problem with allowing IOCB_NOWAIT is that for an application that sends > multiple write AIOs with a single io_submit(), all write AIOs after one that is > failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will > have an unaligned write offset. I am wondering if that is really a problem at > all since the application can catch the -EAGAIN and -EINVAL, indicating that the > write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ? I don't think supporting IOCB_NOWAIT with current zonefs is very useful. It will be very useful once Zone Append is supported. But more importantly my patch fixes a bug in the current zonefs implementation. We need to fix that before 5.6 is released. Any additional functionaity can still come later if we think that it is useful.
On 2020/02/25 5:19, Christoph Hellwig wrote: > On Mon, Feb 24, 2020 at 01:48:48PM +0000, Damien Le Moal wrote: >> >> The main problem with allowing IOCB_NOWAIT is that for an application that sends >> multiple write AIOs with a single io_submit(), all write AIOs after one that is >> failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will >> have an unaligned write offset. I am wondering if that is really a problem at >> all since the application can catch the -EAGAIN and -EINVAL, indicating that the >> write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ? > > I don't think supporting IOCB_NOWAIT with current zonefs is very useful. > It will be very useful once Zone Append is supported. > > But more importantly my patch fixes a bug in the current zonefs > implementation. We need to fix that before 5.6 is released. Any > additional functionaity can still come later if we think that it is > useful. > OK. Fair enough. I will apply your patch. Thanks !
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 8bc6ef82d693..69aee3dfb660 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -601,13 +601,13 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) ssize_t ret; /* - * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT + * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT * as this can cause write reordering (e.g. the first aio gets EAGAIN * on the inode lock but the second goes through but is now unaligned). */ - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) - && (iocb->ki_flags & IOCB_NOWAIT)) - iocb->ki_flags &= ~IOCB_NOWAIT; + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) && + (iocb->ki_flags & IOCB_NOWAIT)) + return -EOPNOTSUPP; if (iocb->ki_flags & IOCB_NOWAIT) { if (!inode_trylock(inode))
IOCB_NOWAIT can't just be ignored as it breaks applications expecting it not to block. Just refuse the operation as applications must handle that (e.g. by falling back to a thread pool). Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/zonefs/super.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)