diff mbox

xfs: fix bogus space reservation in xfs_iomap_write_allocate

Message ID 1470245586-14068-1-git-send-email-hch@lst.de (mailing list archive)
State Accepted
Headers show

Commit Message

Christoph Hellwig Aug. 3, 2016, 5:33 p.m. UTC
The space reservations was without an explaination back in commit

    "Add error reporting calls in error paths that return EFSCORRUPTED"

back in 2003.  There is no reason to reserve disk blocks in the
transaction when allocating blocks for delalloc space as we already
reserved the space when creating the delalloc extent.

With this fix we stop running out of the reserved pool in generic/229,
which has happened for long time with small blocksize file systems,
and has increased in severity with the new buffered write path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

Comments

Dave Chinner Aug. 5, 2016, 12:03 a.m. UTC | #1
On Wed, Aug 03, 2016 at 07:33:06PM +0200, Christoph Hellwig wrote:
> The space reservations was without an explaination back in commit
> 
>     "Add error reporting calls in error paths that return EFSCORRUPTED"
> 
> back in 2003.  There is no reason to reserve disk blocks in the
> transaction when allocating blocks for delalloc space as we already
> reserved the space when creating the delalloc extent.
> 
> With this fix we stop running out of the reserved pool in generic/229,
> which has happened for long time with small blocksize file systems,
> and has increased in severity with the new buffered write path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 2114d53..279353c 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -691,7 +691,6 @@ xfs_iomap_write_allocate(
>  	xfs_trans_t	*tp;
>  	int		nimaps;
>  	int		error = 0;
> -	int		nres;
>  
>  	/*
>  	 * Make sure that the dquots are there.
> @@ -715,12 +714,15 @@ xfs_iomap_write_allocate(
>  		 * is in the delayed allocation extent on which we sit
>  		 * but before our buffer starts.
>  		 */
> -
>  		nimaps = 0;
>  		while (nimaps == 0) {
> -			nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
> -
> -			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, nres,
> +			/*
> +			 * We have already reserved space for the extent and any
> +			 * indirect blocks when creating the delalloc extent,
> +			 * there is no need to reserve space in this transaction
> +			 * again.
> +			 */
> +			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0,
>  					0, XFS_TRANS_RESERVE, &tp);
>  			if (error)
>  				return error;
> @@ -783,7 +785,7 @@ xfs_iomap_write_allocate(
>  			 */
>  			error = xfs_bmapi_write(tp, ip, map_start_fsb,
>  						count_fsb, 0, &first_block,
> -						nres, imap, &nimaps,
> +						0, imap, &nimaps,
>  						&dfops);

I don't think this part of the fix is correct. nres feeds into
args->total which is then used during the AGFL fixup checks. If this
is not set correctly, then we'll select AGs we have enough space in
the AG to fix up the AGFL, but not enough space to allocate all the
BMBT blocks we require. That then leads to ABBA deadlocks on AGF
locks near ENOSPC - see commit dbd5c8c ("xfs: pass total block
res. as total xfs_bmapi_write() parameter") for the full details.

I've been testing a local version of this fix since you pointed out
the problem that still passed nres into xfs_bmapi_write() and I
haven't seen any problems, so I think it is correct to keep nres
here. I'm going to drop this hunk from this patch for the moment in
my tree.

Cheers,

Dave.
Christoph Hellwig Aug. 11, 2016, 4:03 p.m. UTC | #2
On Fri, Aug 05, 2016 at 10:03:54AM +1000, Dave Chinner wrote:
> I don't think this part of the fix is correct. nres feeds into
> args->total which is then used during the AGFL fixup checks. If this
> is not set correctly, then we'll select AGs we have enough space in
> the AG to fix up the AGFL, but not enough space to allocate all the
> BMBT blocks we require. That then leads to ABBA deadlocks on AGF
> locks near ENOSPC - see commit dbd5c8c ("xfs: pass total block
> res. as total xfs_bmapi_write() parameter") for the full details.

I've been going forth and back between both versions and both have
tested fine - I couldn't really convince me which one is more correct.

> I've been testing a local version of this fix since you pointed out
> the problem that still passed nres into xfs_bmapi_write() and I
> haven't seen any problems, so I think it is correct to keep nres
> here. I'm going to drop this hunk from this patch for the moment in
> my tree.

Ok, sounds fine.  If you want a real resend let me know.
diff mbox

Patch

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 2114d53..279353c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -691,7 +691,6 @@  xfs_iomap_write_allocate(
 	xfs_trans_t	*tp;
 	int		nimaps;
 	int		error = 0;
-	int		nres;
 
 	/*
 	 * Make sure that the dquots are there.
@@ -715,12 +714,15 @@  xfs_iomap_write_allocate(
 		 * is in the delayed allocation extent on which we sit
 		 * but before our buffer starts.
 		 */
-
 		nimaps = 0;
 		while (nimaps == 0) {
-			nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
-
-			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, nres,
+			/*
+			 * We have already reserved space for the extent and any
+			 * indirect blocks when creating the delalloc extent,
+			 * there is no need to reserve space in this transaction
+			 * again.
+			 */
+			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0,
 					0, XFS_TRANS_RESERVE, &tp);
 			if (error)
 				return error;
@@ -783,7 +785,7 @@  xfs_iomap_write_allocate(
 			 */
 			error = xfs_bmapi_write(tp, ip, map_start_fsb,
 						count_fsb, 0, &first_block,
-						nres, imap, &nimaps,
+						0, imap, &nimaps,
 						&dfops);
 			if (error)
 				goto trans_cancel;