diff mbox

[3/4] xfs: adjust allocation length in xfs_alloc_space_available

Message ID 1481644767-9098-4-git-send-email-hch@lst.de (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Christoph Hellwig Dec. 13, 2016, 3:59 p.m. UTC
We must decide in xfs_alloc_fix_freelist if we can perform an
allocation from a given AG is possible or not based on the available
space, and should not fail the allocation past that point on a
healthy file system.

But currently we have two additional places that second-guess
xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the
maxlen parameter to remove the reservation before doing the
allocation (but ignores the various minium freespace requirements),
and xfs_alloc_fix_minleft tries to fix up the allocated length
after we've found an extent, but ignores the reservations and also
doesn't take the AGFL into account (and thus fails allocations
for not matching minlen in some cases).

Remove all these later fixups and just correct the maxlen argument
inside xfs_alloc_fix_freelist once we have the AGF buffer locked.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_alloc.c | 78 +++++++++--------------------------------------
 fs/xfs/libxfs/xfs_alloc.h |  2 +-
 2 files changed, 16 insertions(+), 64 deletions(-)

Comments

Libor Klepáč Dec. 15, 2016, 8:41 p.m. UTC | #1
Hello,

On úterý 13. prosince 2016 16:59:26 CET Christoph Hellwig wrote:
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 78 ++++++++
+--------------------------------------
>  fs/xfs/libxfs/xfs_alloc.h |  2 +-
>  2 files changed, 16 insertions(+), 64 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 91a6c17..87328a8 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1149,11 +1104,6 @@ xfs_alloc_ag_vextent_near(
>  		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
>  		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)-
>agf_length));
>  		args->len = blen;
> -		if (!xfs_alloc_fix_minleft(args)) {
> -			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
> -			trace_xfs_alloc_near_nominleft(args);
> -			return 0;
> -		}
>  		blen = args->len;
>  		/*
>  		 * We are allocating starting at bnew for blen blocks.

Doesn't this produce this code?

  		args->len = blen;
  		blen = args->len;


Libor

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Dec. 16, 2016, 12:28 a.m. UTC | #2
On Tue, Dec 13, 2016 at 04:59:26PM +0100, Christoph Hellwig wrote:
> We must decide in xfs_alloc_fix_freelist if we can perform an
> allocation from a given AG is possible or not based on the available
> space, and should not fail the allocation past that point on a
> healthy file system.
> 
> But currently we have two additional places that second-guess
> xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the
> maxlen parameter to remove the reservation before doing the
> allocation (but ignores the various minium freespace requirements),
> and xfs_alloc_fix_minleft tries to fix up the allocated length
> after we've found an extent, but ignores the reservations and also
> doesn't take the AGFL into account (and thus fails allocations
> for not matching minlen in some cases).
> 
> Remove all these later fixups and just correct the maxlen argument
> inside xfs_alloc_fix_freelist once we have the AGF buffer locked.

Ok, the concept seems solid....

> @@ -2073,10 +2015,19 @@ xfs_alloc_space_available(
>  
>  	/* do we have enough free space remaining for the allocation? */
>  	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
> -			  reservation - min_free - args->total);
> +			  reservation - min_free - args->minleft);

This is fine, but...

> -	if (available < (int)args->minleft || available <= 0)
> +	if (available < (int)args->total)
>  		return false;

this is where I begin to wonder. The "args->total" logic here just
doesn't read cleanly to me. xfs_bmapi_write() says:

 * An upper bound for the number of blocks to be allocated is supplied to
 * the first call in "total"; if no allocation group has that many free
 * blocks then the call will fail (return NULLFSBLOCK in "firstblock"). 

So if we are asked to allocate 1 block, but the AG doesn't have 10
total blocks free, then allocation will fail. What this is used for
is to chain multiple independent data block allocations together in
a single transaction to attempt to get them all from the one AG.
This is used only by the directory/attr code for ensuring all the
allocations needed to add an entry to the dir/attr tree will succeed.
It's essentially an "external block reservation" as the AGF will be
held locked across the multiple allocations once the first
allocation has been done.

The only time "total" is actually meaningful is the first
allocation in a chain. i.e. when firstblock is null. It's really a
"free blocks required to proceed" parameter , not a length
bound for the current allocation.

However, it's impact is to set a maximum length bound on the
allocation, so I'm left to wonder why this was is being hidden this
inside xfs_alloc_space_available() rather than dealing with it when
setting up args->maxlen/minlen/minleft in xfs_bmap_btalloc()?

i.e args->maxlen must always be less than args->total. And if we are
using minleft to protect against running out of space for
bmbt/rmapbt allocation, then I think it should be args->maxlen +
args->minleft < args->total.

If this can all be done and enforced in xfs_bmap_btalloc(), then we
can get rid of args->total from the allocargs completely...


> +	/*
> +	 * Clamp maxlen to the amount of free space available for the actual
> +	 * extent allocation.
> +	 */
> +	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
> +		args->maxlen = available;
> +		ASSERT(args->maxlen > 0);
> +	}

I'd love to get rid of all these (int) casts, too...

Cheers,

Dave.
Christoph Hellwig Dec. 16, 2016, 8:20 a.m. UTC | #3
On Thu, Dec 15, 2016 at 09:41:39PM +0100, Libor Klepáč wrote:
> > index 91a6c17..87328a8 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -1149,11 +1104,6 @@ xfs_alloc_ag_vextent_near(
> >  		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> >  		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)-
> >agf_length));
> >  		args->len = blen;
> > -		if (!xfs_alloc_fix_minleft(args)) {
> > -			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
> > -			trace_xfs_alloc_near_nominleft(args);
> > -			return 0;
> > -		}
> >  		blen = args->len;
> >  		/*
> >  		 * We are allocating starting at bnew for blen blocks.
> 
> Doesn't this produce this code?
> 
>   		args->len = blen;
>   		blen = args->len;

Yes, that second line could be removed, and I'll do that in the next
iteration.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Dec. 16, 2016, 8:25 a.m. UTC | #4
On Fri, Dec 16, 2016 at 11:28:51AM +1100, Dave Chinner wrote:
> >  	/* do we have enough free space remaining for the allocation? */
> >  	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
> > -			  reservation - min_free - args->total);
> > +			  reservation - min_free - args->minleft);
> 
> This is fine, but...
> 
> > -	if (available < (int)args->minleft || available <= 0)
> > +	if (available < (int)args->total)
> >  		return false;
> 
> this is where I begin to wonder. The "args->total" logic here just
> doesn't read cleanly to me. xfs_bmapi_write() says:

args->total is a complete mess, but the above just rearranged the
deckchairs by moving it to a different place in the equation without
changing the result..

> So if we are asked to allocate 1 block, but the AG doesn't have 10
> total blocks free, then allocation will fail. What this is used for
> is to chain multiple independent data block allocations together in
> a single transaction to attempt to get them all from the one AG.
> This is used only by the directory/attr code for ensuring all the
> allocations needed to add an entry to the dir/attr tree will succeed.
> It's essentially an "external block reservation" as the AGF will be
> held locked across the multiple allocations once the first
> allocation has been done.

Symlink creation also uses it to cover the blocks for the symlink
body.  And unlike all users it actually seems to get the semantics
right by decrementing the used blocks from args->total after each
allocation..

> The only time "total" is actually meaningful is the first
> allocation in a chain. i.e. when firstblock is null. It's really a
> "free blocks required to proceed" parameter , not a length
> bound for the current allocation.

Yes.

> However, it's impact is to set a maximum length bound on the
> allocation, so I'm left to wonder why this was is being hidden this
> inside xfs_alloc_space_available() rather than dealing with it when
> setting up args->maxlen/minlen/minleft in xfs_bmap_btalloc()?
> 
> i.e args->maxlen must always be less than args->total. And if we are
> using minleft to protect against running out of space for
> bmbt/rmapbt allocation, then I think it should be args->maxlen +
> args->minleft < args->total.
> 
> If this can all be done and enforced in xfs_bmap_btalloc(), then we
> can get rid of args->total from the allocargs completely...

My long terms plan was to kill it of in favour of passing a minleft
parameter that's only set on the first call to xfs_bmapi_write.
But I didn't fancy rewriting hairy parts of the dir code while trying
to get an urgent customer escalation fixed..

> 
> 
> > +	/*
> > +	 * Clamp maxlen to the amount of free space available for the actual
> > +	 * extent allocation.
> > +	 */
> > +	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
> > +		args->maxlen = available;
> > +		ASSERT(args->maxlen > 0);
> > +	}
> 
> I'd love to get rid of all these (int) casts, too...

The problem here is that we compare 32-bit signed to 32-bit unsigned
variables.  And given that this is ripe for nasty bugs due to the C
type promotion rules I'd rather be extra careful.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 91a6c17..87328a8 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -365,36 +365,12 @@  xfs_alloc_fix_len(
 		return;
 	ASSERT(rlen >= args->minlen && rlen <= args->maxlen);
 	ASSERT(rlen % args->prod == args->mod);
+	ASSERT(args->pag->pagf_freeblks + args->pag->pagf_flcount >=
+		rlen + args->minleft);
 	args->len = rlen;
 }
 
 /*
- * Fix up length if there is too little space left in the a.g.
- * Return 1 if ok, 0 if too little, should give up.
- */
-STATIC int
-xfs_alloc_fix_minleft(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
-{
-	xfs_agf_t	*agf;		/* a.g. freelist header */
-	int		diff;		/* free space difference */
-
-	if (args->minleft == 0)
-		return 1;
-	agf = XFS_BUF_TO_AGF(args->agbp);
-	diff = be32_to_cpu(agf->agf_freeblks)
-		- args->len - args->minleft;
-	if (diff >= 0)
-		return 1;
-	args->len += diff;		/* shrink the allocated space */
-	/* casts to (int) catch length underflows */
-	if ((int)args->len >= (int)args->minlen)
-		return 1;
-	args->agbno = NULLAGBLOCK;
-	return 0;
-}
-
-/*
  * Update the two btrees, logically removing from freespace the extent
  * starting at rbno, rlen blocks.  The extent is contained within the
  * actual (current) free extent fbno for flen blocks.
@@ -689,8 +665,6 @@  xfs_alloc_ag_vextent(
 	xfs_alloc_arg_t	*args)	/* argument structure for allocation */
 {
 	int		error=0;
-	xfs_extlen_t	reservation;
-	xfs_extlen_t	oldmax;
 
 	ASSERT(args->minlen > 0);
 	ASSERT(args->maxlen > 0);
@@ -699,20 +673,6 @@  xfs_alloc_ag_vextent(
 	ASSERT(args->alignment > 0);
 
 	/*
-	 * Clamp maxlen to the amount of free space minus any reservations
-	 * that have been made.
-	 */
-	oldmax = args->maxlen;
-	reservation = xfs_ag_resv_needed(args->pag, args->resv);
-	if (args->maxlen > args->pag->pagf_freeblks - reservation)
-		args->maxlen = args->pag->pagf_freeblks - reservation;
-	if (args->maxlen == 0) {
-		args->agbno = NULLAGBLOCK;
-		args->maxlen = oldmax;
-		return 0;
-	}
-
-	/*
 	 * Branch to correct routine based on the type.
 	 */
 	args->wasfromfl = 0;
@@ -731,8 +691,6 @@  xfs_alloc_ag_vextent(
 		/* NOTREACHED */
 	}
 
-	args->maxlen = oldmax;
-
 	if (error || args->agbno == NULLAGBLOCK)
 		return error;
 
@@ -841,9 +799,6 @@  xfs_alloc_ag_vextent_exact(
 	args->len = XFS_AGBLOCK_MIN(tend, args->agbno + args->maxlen)
 						- args->agbno;
 	xfs_alloc_fix_len(args);
-	if (!xfs_alloc_fix_minleft(args))
-		goto not_found;
-
 	ASSERT(args->agbno + args->len <= tend);
 
 	/*
@@ -1149,11 +1104,6 @@  xfs_alloc_ag_vextent_near(
 		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
 		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
 		args->len = blen;
-		if (!xfs_alloc_fix_minleft(args)) {
-			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-			trace_xfs_alloc_near_nominleft(args);
-			return 0;
-		}
 		blen = args->len;
 		/*
 		 * We are allocating starting at bnew for blen blocks.
@@ -1346,12 +1296,6 @@  xfs_alloc_ag_vextent_near(
 	 */
 	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 	xfs_alloc_fix_len(args);
-	if (!xfs_alloc_fix_minleft(args)) {
-		trace_xfs_alloc_near_nominleft(args);
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-		return 0;
-	}
 	rlen = args->len;
 	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
 				     args->datatype, ltbnoa, ltlena, &ltnew);
@@ -1553,8 +1497,6 @@  xfs_alloc_ag_vextent_size(
 	}
 	xfs_alloc_fix_len(args);
 
-	if (!xfs_alloc_fix_minleft(args))
-		goto out_nominleft;
 	rlen = args->len;
 	XFS_WANT_CORRUPTED_GOTO(args->mp, rlen <= flen, error0);
 	/*
@@ -2073,10 +2015,19 @@  xfs_alloc_space_available(
 
 	/* do we have enough free space remaining for the allocation? */
 	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
-			  reservation - min_free - args->total);
-	if (available < (int)args->minleft || available <= 0)
+			  reservation - min_free - args->minleft);
+	if (available < (int)args->total)
 		return false;
 
+	/*
+	 * Clamp maxlen to the amount of free space available for the actual
+	 * extent allocation.
+	 */
+	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
+		args->maxlen = available;
+		ASSERT(args->maxlen > 0);
+	}
+
 	return true;
 }
 
@@ -2122,7 +2073,8 @@  xfs_alloc_fix_freelist(
 	}
 
 	need = xfs_alloc_min_freelist(mp, pag);
-	if (!xfs_alloc_space_available(args, need, flags))
+	if (!xfs_alloc_space_available(args, need, flags |
+			XFS_ALLOC_FLAG_CHECK))
 		goto out_agbp_relse;
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 7c404a6..1d0f48a 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -56,7 +56,7 @@  typedef unsigned int xfs_alloctype_t;
 #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
 #define	XFS_ALLOC_FLAG_NORMAP	0x00000004  /* don't modify the rmapbt */
 #define	XFS_ALLOC_FLAG_NOSHRINK	0x00000008  /* don't shrink the freelist */
-
+#define	XFS_ALLOC_FLAG_CHECK	0x00000010  /* test only, don't modify args */
 
 /*
  * Argument structure for xfs_alloc routines.