diff mbox series

zonefs: fix IOCB_NOWAIT handling

Message ID 20200221143723.482323-1-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series zonefs: fix IOCB_NOWAIT handling | expand

Commit Message

Christoph Hellwig Feb. 21, 2020, 2:37 p.m. UTC
IOCB_NOWAIT can't just be ignored as it breaks applications expecting
it not to block.  Just refuse the operation as applications must handle
that (e.g. by falling back to a thread pool).

Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/zonefs/super.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Damien Le Moal Feb. 24, 2020, 1:48 p.m. UTC | #1
Hi Christoph,

On 2020/02/21 23:37, Christoph Hellwig wrote:
> IOCB_NOWAIT can't just be ignored as it breaks applications expecting
> it not to block.  Just refuse the operation as applications must handle
> that (e.g. by falling back to a thread pool).
> 
> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/zonefs/super.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 8bc6ef82d693..69aee3dfb660 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -601,13 +601,13 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
>  	ssize_t ret;
>  
>  	/*
> -	 * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT
> +	 * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT
>  	 * as this can cause write reordering (e.g. the first aio gets EAGAIN
>  	 * on the inode lock but the second goes through but is now unaligned).
>  	 */
> -	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb)
> -	    && (iocb->ki_flags & IOCB_NOWAIT))
> -		iocb->ki_flags &= ~IOCB_NOWAIT;
> +	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) &&
> +	    (iocb->ki_flags & IOCB_NOWAIT))
> +		return -EOPNOTSUPP;
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT) {
>  		if (!inode_trylock(inode))
> 

The main problem with allowing IOCB_NOWAIT is that for an application that sends
multiple write AIOs with a single io_submit(), all write AIOs after one that is
failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will
have an unaligned write offset. I am wondering if that is really a problem at
all since the application can catch the -EAGAIN and -EINVAL, indicating that the
write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ?

Dave did not think it is. I was split on this. Would be happy to hear your
opinion. Another solution I was thinking of is something like the hack below,
which may be cleaner and easier for the application to handle as that reduces
the number of errors for a multi-io io_submit() call with RWF_NOWAIT. To do so,
this introduces the IOCB_WRITE_FAIL_FAST kiocb flag that is set in
zonefs_file_dio_write() for a nowait kiocb to a sequential zone file. If this
function fails (with -EAGAIN or any other error), this flag instruct aio_write()
to return the error for the failed kiocb instead of 0. This error return of
aio_write() then stops the io_submit() loop on the first failed iocb.


diff --git a/fs/aio.c b/fs/aio.c
index 5f3d3d814928..9f01541c8ecc 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1568,6 +1568,8 @@ static int aio_write(struct kiocb *req, const struct iocb
*iocb,
 		return ret;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret) {
+		ssize_t err;
+
 		/*
 		 * Open-code file_start_write here to grab freeze protection,
 		 * which will be released by another thread in
@@ -1580,7 +1582,12 @@ static int aio_write(struct kiocb *req, const struct iocb
*iocb,
 			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
 		}
 		req->ki_flags |= IOCB_WRITE;
-		aio_rw_done(req, call_write_iter(file, req, &iter));
+		err = call_write_iter(file, req, &iter);
+		if (err != -EIOCBQUEUED) {
+		    aio_rw_done(req, err);
+		    if ((req->ki_flags & IOCB_WRITE_FAIL_FAST) && err < 0)
+			    ret = err;
+		}
 	}
 	kfree(iovec);
 	return ret;
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8bc6ef82d693..0fa098354e38 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -601,13 +601,14 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb,
struct iov_iter *from)
 	ssize_t ret;

 	/*
-	 * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT
-	 * as this can cause write reordering (e.g. the first aio gets EAGAIN
-	 * on the inode lock but the second goes through but is now unaligned).
+	 * For async direct IOs to sequential zone files, IOCB_NOWAIT can cause
+	 * unaligned writes in case of EAGAIN error or any failure to issue the
+	 * direct IO. So tell vfs to stop io_submit() on the first failure to
+	 * avoid more failed aios than necessary.
 	 */
 	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb)
 	    && (iocb->ki_flags & IOCB_NOWAIT))
-		iocb->ki_flags &= ~IOCB_NOWAIT;
+		iocb->ki_flags |= IOCB_WRITE_FAIL_FAST;

 	if (iocb->ki_flags & IOCB_NOWAIT) {
 		if (!inode_trylock(inode))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3cd4fe6b845e..8e7df1cc92e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -314,6 +314,7 @@ enum rw_hint {
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
+#define IOCB_WRITE_FAIL_FAST	(1 << 8)

 struct kiocb {
 	struct file		*ki_filp;


Of note is that checking the code path for iomap_direct_io(), I noticed that
there are at least 2 places places where the code can block: dio allocation with
GFP_KERNEL and the BIOs allocations for that dio also with GFP_KERNEL. While the
latter may be necessary to avoid partial failures of large direct IOs that need
to be split into multiple BIOs, shouldn't the dio allocation be done with
GFP_NOWAIT when IOMAP_NOWAIT is set ?

Best regards.
Christoph Hellwig Feb. 24, 2020, 8:18 p.m. UTC | #2
On Mon, Feb 24, 2020 at 01:48:48PM +0000, Damien Le Moal wrote:
> 
> The main problem with allowing IOCB_NOWAIT is that for an application that sends
> multiple write AIOs with a single io_submit(), all write AIOs after one that is
> failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will
> have an unaligned write offset. I am wondering if that is really a problem at
> all since the application can catch the -EAGAIN and -EINVAL, indicating that the
> write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ?

I don't think supporting IOCB_NOWAIT with current zonefs is very useful.
It will be very useful once Zone Append is supported.

But more importantly my patch fixes a bug in the current zonefs
implementation.  We need to fix that before 5.6 is released.  Any
additional functionaity can still come later if we think that it is
useful.
Damien Le Moal Feb. 25, 2020, 9:54 a.m. UTC | #3
On 2020/02/25 5:19, Christoph Hellwig wrote:
> On Mon, Feb 24, 2020 at 01:48:48PM +0000, Damien Le Moal wrote:
>>
>> The main problem with allowing IOCB_NOWAIT is that for an application that sends
>> multiple write AIOs with a single io_submit(), all write AIOs after one that is
>> failed with -EAGAIN will be failed with -EINVAL (not -EIO !) since they will
>> have an unaligned write offset. I am wondering if that is really a problem at
>> all since the application can catch the -EAGAIN and -EINVAL, indicating that the
>> write stream was stopped. So maybe it is reasonable to simply allow IOCB_NOWAIT ?
> 
> I don't think supporting IOCB_NOWAIT with current zonefs is very useful.
> It will be very useful once Zone Append is supported.
> 
> But more importantly my patch fixes a bug in the current zonefs
> implementation.  We need to fix that before 5.6 is released.  Any
> additional functionaity can still come later if we think that it is
> useful.
> 

OK. Fair enough. I will apply your patch.
Thanks !
diff mbox series

Patch

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8bc6ef82d693..69aee3dfb660 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -601,13 +601,13 @@  static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	ssize_t ret;
 
 	/*
-	 * For async direct IOs to sequential zone files, ignore IOCB_NOWAIT
+	 * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT
 	 * as this can cause write reordering (e.g. the first aio gets EAGAIN
 	 * on the inode lock but the second goes through but is now unaligned).
 	 */
-	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb)
-	    && (iocb->ki_flags & IOCB_NOWAIT))
-		iocb->ki_flags &= ~IOCB_NOWAIT;
+	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) &&
+	    (iocb->ki_flags & IOCB_NOWAIT))
+		return -EOPNOTSUPP;
 
 	if (iocb->ki_flags & IOCB_NOWAIT) {
 		if (!inode_trylock(inode))