[09/10] xfs: nowait aio support
diff mbox

Message ID 20170524164150.9492-10-rgoldwyn@suse.de
State New
Headers show

Commit Message

Goldwyn Rodrigues May 24, 2017, 4:41 p.m. UTC
From: Goldwyn Rodrigues <rgoldwyn@suse.com>

If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
immediately.

IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
if it needs allocation either due to file extension, writing to a hole,
or COW or waiting for other DIOs to finish.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 19 ++++++++++++++-----
 fs/xfs/xfs_iomap.c | 17 +++++++++++++++++
 2 files changed, 31 insertions(+), 5 deletions(-)

Comments

Darrick J. Wong May 24, 2017, 4:49 p.m. UTC | #1
On Wed, May 24, 2017 at 11:41:49AM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
> immediately.
> 
> IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
> if it needs allocation either due to file extension, writing to a hole,
> or COW or waiting for other DIOs to finish.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_file.c  | 19 ++++++++++++++-----
>  fs/xfs/xfs_iomap.c | 17 +++++++++++++++++
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 35703a801372..b307940e7d56 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -541,8 +541,11 @@ xfs_file_dio_aio_write(
>  		iolock = XFS_IOLOCK_SHARED;
>  	}
>  
> -	xfs_ilock(ip, iolock);
> -
> +	if (!xfs_ilock_nowait(ip, iolock)) {
> +		if (iocb->ki_flags & IOCB_NOWAIT)
> +			return -EAGAIN;
> +		xfs_ilock(ip, iolock);
> +	}
>  	ret = xfs_file_aio_write_checks(iocb, from, &iolock);
>  	if (ret)
>  		goto out;
> @@ -553,9 +556,15 @@ xfs_file_dio_aio_write(
>  	 * otherwise demote the lock if we had to take the exclusive lock
>  	 * for other reasons in xfs_file_aio_write_checks.
>  	 */
> -	if (unaligned_io)
> -		inode_dio_wait(inode);
> -	else if (iolock == XFS_IOLOCK_EXCL) {
> +	if (unaligned_io) {
> +		/* If we are going to wait for other DIO to finish, bail */
> +		if (iocb->ki_flags & IOCB_NOWAIT) {
> +			if (atomic_read(&inode->i_dio_count))
> +				return -EAGAIN;
> +		} else {
> +			inode_dio_wait(inode);
> +		}
> +	} else if (iolock == XFS_IOLOCK_EXCL) {
>  		xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
>  		iolock = XFS_IOLOCK_SHARED;
>  	}
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 94e5bdf7304c..8b0e3c1e086d 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1016,6 +1016,15 @@ xfs_file_iomap_begin(
>  
>  	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
>  		if (flags & IOMAP_DIRECT) {
> +			/*
> +			 * A reflinked inode will result in CoW alloc.
> +			 * FIXME: It could still overwrite on unshared extents
> +			 * and not need allocation.
> +			 */
> +			if (flags & IOMAP_NOWAIT) {
> +				error = -EAGAIN;
> +				goto out_unlock;
> +			}
>  			/* may drop and re-acquire the ilock */
>  			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
>  					&lockmode);
> @@ -1033,6 +1042,14 @@ xfs_file_iomap_begin(
>  
>  	if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, &imap, nimaps)) {
>  		/*
> +		 * If nowait is set bail since we are going to make
> +		 * allocations.
> +		 */
> +		if (flags & IOMAP_NOWAIT) {
> +			error = -EAGAIN;
> +			goto out_unlock;
> +		}
> +		/*
>  		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
>  		 * pages to keep the chunks of work done where somewhat symmetric
>  		 * with the work writeback does. This is a completely arbitrary
> -- 
> 2.12.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 28, 2017, 9:31 a.m. UTC | #2
Despite my previous reviewed-by tag this will need another fix:

xfs_file_iomap_begin needs to return EAGAIN if we don't have the extent
list in memoery already.  E.g. something like this:

	if ((flags & IOMAP_NOWAIT) && !(ip->i_d.if_flags & XFS_IFEXTENTS)) {
		error = -EAGAIN;
		goto out_unlock;
	}

right after locking the ilock.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Goldwyn Rodrigues May 29, 2017, 2:38 a.m. UTC | #3
On 05/28/2017 04:31 AM, Christoph Hellwig wrote:
> Despite my previous reviewed-by tag this will need another fix:
> 
> xfs_file_iomap_begin needs to return EAGAIN if we don't have the extent
> list in memoery already.  E.g. something like this:
> 
> 	if ((flags & IOMAP_NOWAIT) && !(ip->i_d.if_flags & XFS_IFEXTENTS)) {
> 		error = -EAGAIN;
> 		goto out_unlock;
> 	}
> 
> right after locking the ilock.
> 

I am not sure if it is right to penalize the application to write to
file which has been freshly opened (and is the first one to open). It
basically means extent maps needs to be read from disk. Do you see a
reason it would have a non-deterministic wait if it is the only user? I
understand the block layer can block if it has too many requests though.
Jan Kara May 29, 2017, 8:21 a.m. UTC | #4
On Sun 28-05-17 21:38:26, Goldwyn Rodrigues wrote:
> On 05/28/2017 04:31 AM, Christoph Hellwig wrote:
> > Despite my previous reviewed-by tag this will need another fix:
> > 
> > xfs_file_iomap_begin needs to return EAGAIN if we don't have the extent
> > list in memoery already.  E.g. something like this:
> > 
> > 	if ((flags & IOMAP_NOWAIT) && !(ip->i_d.if_flags & XFS_IFEXTENTS)) {
> > 		error = -EAGAIN;
> > 		goto out_unlock;
> > 	}
> > 
> > right after locking the ilock.
> 
> I am not sure if it is right to penalize the application to write to
> file which has been freshly opened (and is the first one to open). It
> basically means extent maps needs to be read from disk. Do you see a
> reason it would have a non-deterministic wait if it is the only user? I
> understand the block layer can block if it has too many requests though.

Well, submitting such write will have to wait for read of metadata from
disk. That is certainly considered blocking so Christoph is right that we
must return EAGAIN in such case.

								Honza
Christoph Hellwig May 29, 2017, 8:33 a.m. UTC | #5
On Sun, May 28, 2017 at 09:38:26PM -0500, Goldwyn Rodrigues wrote:
> 
> 
> On 05/28/2017 04:31 AM, Christoph Hellwig wrote:
> > Despite my previous reviewed-by tag this will need another fix:
> > 
> > xfs_file_iomap_begin needs to return EAGAIN if we don't have the extent
> > list in memoery already.  E.g. something like this:
> > 
> > 	if ((flags & IOMAP_NOWAIT) && !(ip->i_d.if_flags & XFS_IFEXTENTS)) {
> > 		error = -EAGAIN;
> > 		goto out_unlock;
> > 	}
> > 
> > right after locking the ilock.
> > 
> 
> I am not sure if it is right to penalize the application to write to
> file which has been freshly opened (and is the first one to open). It
> basically means extent maps needs to be read from disk. Do you see a
> reason it would have a non-deterministic wait if it is the only user? I
> understand the block layer can block if it has too many requests though.

For either a read or a write we might have to read in the extent list
(note that for few enough extents they are stored in the inode and
we won't have to), in which case the call will block and by the
semantics you define we'll need to return -EAGAIN.

Btw, can you write a small blurb up for the man page to document these
ѕemantics in man-page like language?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Goldwyn Rodrigues May 30, 2017, 4:13 p.m. UTC | #6
On 05/29/2017 03:33 AM, Christoph Hellwig wrote:
> On Sun, May 28, 2017 at 09:38:26PM -0500, Goldwyn Rodrigues wrote:
>>
>>
>> On 05/28/2017 04:31 AM, Christoph Hellwig wrote:
>>> Despite my previous reviewed-by tag this will need another fix:
>>>
>>> xfs_file_iomap_begin needs to return EAGAIN if we don't have the extent
>>> list in memoery already.  E.g. something like this:
>>>
>>> 	if ((flags & IOMAP_NOWAIT) && !(ip->i_d.if_flags & XFS_IFEXTENTS)) {
>>> 		error = -EAGAIN;
>>> 		goto out_unlock;
>>> 	}
>>>
>>> right after locking the ilock.
>>>
>>
>> I am not sure if it is right to penalize the application to write to
>> file which has been freshly opened (and is the first one to open). It
>> basically means extent maps needs to be read from disk. Do you see a
>> reason it would have a non-deterministic wait if it is the only user? I
>> understand the block layer can block if it has too many requests though.
> 
> For either a read or a write we might have to read in the extent list
> (note that for few enough extents they are stored in the inode and
> we won't have to), in which case the call will block and by the
> semantics you define we'll need to return -EAGAIN.

Yes, that is right. I will include it in.

> 
> Btw, can you write a small blurb up for the man page to document these
> ѕemantics in man-page like language?
> 

Yes, but which man page would it belong to?
Should it be a subsection of errors in io_getevents/io_submit. We don't
want to add ERRORS to io_getevents() because it would be the return
value of the io_getevents call, and not the ones in the iocb structure.
Should it be a new man page, say for iocb(7/8)?
Jan Kara May 31, 2017, 8:51 a.m. UTC | #7
On Tue 30-05-17 11:13:29, Goldwyn Rodrigues wrote:
> > Btw, can you write a small blurb up for the man page to document these
> > ѕemantics in man-page like language?
> > 
> 
> Yes, but which man page would it belong to?
> Should it be a subsection of errors in io_getevents/io_submit. We don't
> want to add ERRORS to io_getevents() because it would be the return
> value of the io_getevents call, and not the ones in the iocb structure.
> Should it be a new man page, say for iocb(7/8)?

I think you should extend the manpage for io_submit(8). There you can add
definition of struct iocb in 'DESCRIPTION' section explaining at least the
most common fields. You can also explain there which flags can be passed
and what are they intended to do.

You can also expand EAGAIN error description to specifically mention that
in case of NOWAIT aio EAGAIN can be returned if io submission would block.

								Honza

Patch
diff mbox

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 35703a801372..b307940e7d56 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -541,8 +541,11 @@  xfs_file_dio_aio_write(
 		iolock = XFS_IOLOCK_SHARED;
 	}
 
-	xfs_ilock(ip, iolock);
-
+	if (!xfs_ilock_nowait(ip, iolock)) {
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			return -EAGAIN;
+		xfs_ilock(ip, iolock);
+	}
 	ret = xfs_file_aio_write_checks(iocb, from, &iolock);
 	if (ret)
 		goto out;
@@ -553,9 +556,15 @@  xfs_file_dio_aio_write(
 	 * otherwise demote the lock if we had to take the exclusive lock
 	 * for other reasons in xfs_file_aio_write_checks.
 	 */
-	if (unaligned_io)
-		inode_dio_wait(inode);
-	else if (iolock == XFS_IOLOCK_EXCL) {
+	if (unaligned_io) {
+		/* If we are going to wait for other DIO to finish, bail */
+		if (iocb->ki_flags & IOCB_NOWAIT) {
+			if (atomic_read(&inode->i_dio_count))
+				return -EAGAIN;
+		} else {
+			inode_dio_wait(inode);
+		}
+	} else if (iolock == XFS_IOLOCK_EXCL) {
 		xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
 		iolock = XFS_IOLOCK_SHARED;
 	}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 94e5bdf7304c..8b0e3c1e086d 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1016,6 +1016,15 @@  xfs_file_iomap_begin(
 
 	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
 		if (flags & IOMAP_DIRECT) {
+			/*
+			 * A reflinked inode will result in CoW alloc.
+			 * FIXME: It could still overwrite on unshared extents
+			 * and not need allocation.
+			 */
+			if (flags & IOMAP_NOWAIT) {
+				error = -EAGAIN;
+				goto out_unlock;
+			}
 			/* may drop and re-acquire the ilock */
 			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
 					&lockmode);
@@ -1033,6 +1042,14 @@  xfs_file_iomap_begin(
 
 	if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, &imap, nimaps)) {
 		/*
+		 * If nowait is set bail since we are going to make
+		 * allocations.
+		 */
+		if (flags & IOMAP_NOWAIT) {
+			error = -EAGAIN;
+			goto out_unlock;
+		}
+		/*
 		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
 		 * pages to keep the chunks of work done where somewhat symmetric
 		 * with the work writeback does. This is a completely arbitrary