Message ID | 1481644767-9098-2-git-send-email-hch@lst.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Wed, Dec 14, 2016 at 08:36:26PM +0100, Christoph Hellwig wrote: > On Wed, Dec 14, 2016 at 12:35:07PM -0500, Brian Foster wrote: > > I'm still trying to grok the minleft/dop_low stuff, but doesn't this > > increase the risk of bmbt block allocation failure? > > No. If working correctly it reduces the chance of a failing bmbt block > allocation in transaction to zero, at the expense of potentially > failing whole block allocations while we still have a few blocks left. > But given that the first leads to a fs shutdown and the latter just to > an ENOSPC it seems worthwhile. > Indeed, I completely agree in principle with the series. It's much better to leave some free space efficiency on the table for greater reliability. I just want to make sure I understand how this change works and whether it does so correctly... > > E.g., args.minleft > > is set to the transaction reservation earlier in the function. AFAIK the > > res is not guaranteed to be in a particular AG, but in that case we're > > just setting a start AG and ultimately cycling through the AGs. If that > > fails, this hunk resets minleft, restarts at fsbno 0 and activates the > > sequential agno allocation requirement for subsequent allocs. Doesn't > > removing this logic mean we can successfully reserve blocks in a write > > transaction that will never ultimately be able to allocate bmbt blocks, > > even if the data blocks can all be allocated..? > > We only set it to res if no firstblock was set. The only case where > that is possible during the first bmbt block allocation in a reflink > operation - for actual direct I/O writes, fallocate calls or unwritten > extent conversion we will always have firstblock set from the allocation > of the actual data extent. And that allocation now has the proper minleft > set after this series so that we are guaranteed we have enough space > in one single AG for the whole allocation. > Ok, I need to stare at that codepath a bit more.. In the meantime, what about the delalloc codepath as well? We reserve indlen out of the global pool at write time and track the res in the in-core extents. Writeback comes around later and we try to allocate real blocks without any need for transaction reservation. Now that we set 'args.minleft = xfs_bmap_worst_indlen()' in xfs_bmap_btalloc() and never reset it, what guarantees any particular AG can satisfy that request based on the global reservation? I'm basically just a little nervous that we might rely on some of these minleft hacks to avoid serious failures like in bmbt allocation or writeback, despite the fact that the broader mechanism may be bogus... Brian > In the reflink case where firstblock wasn't set the minleft ensures > that we fail the first allocation if we don't have enough blocks > in a single AG. > retryloop to try the allocation > > > The other thing to note here is that minleft is initialized to 0 and > > only set to the transaction res conditionally, so if firstblock is > > already set we won't use minleft at all. > > Yes, that's intentional - we already did our minleft check when allocating > the data extent. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Dec 13, 2016 at 04:59:24PM +0100, Christoph Hellwig wrote: > We can't just set minleft to 0 when we're low on space - that's exactly > what we need minleft for: to protect space in the AG for btree block > allocations when we are low on free space. > > Signed-off-by: Christoph Hellwig <hch@lst.de> .... The xfs_alloc_vextent() loop changes look fine. The minleft hacks always troubled me. > --- > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 2760bc3..44773c9 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -3812,7 +3812,6 @@ xfs_bmap_btalloc( > args.fsbno = 0; > args.type = XFS_ALLOCTYPE_FIRST_AG; > args.total = ap->minlen; > - args.minleft = 0; > if ((error = xfs_alloc_vextent(&args))) > return error; > ap->dfops->dop_low = true; But this looks wrong. when combined with the next patch that sets: args.minleft = xfs_bmap_worst_indlen(ap->ip, args.maxlen); we've got a minleft value that might be for multigigabyte allocation, but now we are only asking for a total allocation of minlen, and that might be 1 block. IOWs, shouldn't this "last, final attempt" code actually do something like this: args.maxlen = ap->minlen; args.total = ap->minlen; args.minleft = xfs_bmap_worst_indlen(ap->ip, args.maxlen); > +++ b/fs/xfs/libxfs/xfs_bmap_btree.c > @@ -499,20 +499,6 @@ xfs_bmbt_alloc_block( > goto try_another_ag; > } > > - if (args.fsbno == NULLFSBLOCK && args.minleft) { > - /* > - * Could not find an AG with enough free space to satisfy > - * a full btree split. Try again without minleft and if > - * successful activate the lowspace algorithm. > - */ > - args.fsbno = 0; > - args.type = XFS_ALLOCTYPE_FIRST_AG; > - args.minleft = 0; > - error = xfs_alloc_vextent(&args); > - if (error) > - goto error0; > - cur->bc_private.b.dfops->dop_low = true; > - } > if (args.fsbno == NULLFSBLOCK) { > XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); > *stat = 0; Yes, that's fair enough considering the common in the block that set args.minleft, and that was a bug fix introduced long after this fallback was put in place because the fallback could still fail... Cheers, Dave.
On Fri, Dec 16, 2016 at 09:09:38AM +1100, Dave Chinner wrote: > > @@ -3812,7 +3812,6 @@ xfs_bmap_btalloc( > > args.fsbno = 0; > > args.type = XFS_ALLOCTYPE_FIRST_AG; > > args.total = ap->minlen; > > - args.minleft = 0; > > if ((error = xfs_alloc_vextent(&args))) > > return error; > > ap->dfops->dop_low = true; > > But this looks wrong. when combined with the next patch that sets: > > args.minleft = xfs_bmap_worst_indlen(ap->ip, args.maxlen); > > we've got a minleft value that might be for multigigabyte > allocation, but now we are only asking for a total allocation > of minlen, and that might be 1 block. IOWs, shouldn't this "last, > final attempt" code actually do something like this: > > args.maxlen = ap->minlen; > args.total = ap->minlen; > args.minleft = xfs_bmap_worst_indlen(ap->ip, args.maxlen); Yes. And I actually had that in a previous iteration, and it got dropped at some point. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 15, 2016 at 09:34:33AM -0500, Brian Foster wrote: > FWIW, I was playing with this a bit more and managed to manufacture a > filesystem layout that this series doesn't handle too well. Emphasis on > "manufactured" because this might not be a likely real world scenario, > but either way the current code handles it fine. > > I've attached a metadump of the offending image. mdestore it, mount and > attempt something like 'dd if=/dev/zero of=/mnt/file' on the root. The > buffered write looks like it's in a livelock, waiting indefinitely for a > writeback cycle that will never complete... Ok, I'll give it a spin. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 16, 2016 at 09:20:44AM +0100, Christoph Hellwig wrote: > On Fri, Dec 16, 2016 at 09:09:38AM +1100, Dave Chinner wrote: > > > @@ -3812,7 +3812,6 @@ xfs_bmap_btalloc( > > > args.fsbno = 0; > > > args.type = XFS_ALLOCTYPE_FIRST_AG; > > > args.total = ap->minlen; > > > - args.minleft = 0; > > > if ((error = xfs_alloc_vextent(&args))) > > > return error; > > > ap->dfops->dop_low = true; > > > > But this looks wrong. when combined with the next patch that sets: > > > > args.minleft = xfs_bmap_worst_indlen(ap->ip, args.maxlen); > > > > we've got a minleft value that might be for multigigabyte > > allocation, but now we are only asking for a total allocation > > of minlen, and that might be 1 block. IOWs, shouldn't this "last, > > final attempt" code actually do something like this: > > > > args.maxlen = ap->minlen; > > args.total = ap->minlen; > > args.minleft = xfs_bmap_worst_indlen(ap->ip, args.maxlen); > > Yes. And I actually had that in a previous iteration, and it got > dropped at some point. I see the diff hunk in question hasn't changed in the v2 patchset... Did Dave's suggestion not work? --D -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Yes. And I actually had that in a previous iteration, and it got > > dropped at some point. > > I see the diff hunk in question hasn't changed in the v2 patchset... > Did Dave's suggestion not work? We don't need it anymore. The previous iteration set minleft to the worst case indirect blocks for the requested extent length. This series instead sticks to the old minleft value for just a single extent allocation and thus doesn't need the fallback as the minleft value can't get any smaller. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 5050056..91a6c17 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2638,12 +2638,10 @@ xfs_alloc_vextent( xfs_agblock_t agsize; /* allocation group size */ int error; int flags; /* XFS_ALLOC_FLAG_... locking flags */ - xfs_extlen_t minleft;/* minimum left value, temp copy */ xfs_mount_t *mp; /* mount structure pointer */ xfs_agnumber_t sagno; /* starting allocation group number */ xfs_alloctype_t type; /* input allocation type */ int bump_rotor = 0; - int no_min = 0; xfs_agnumber_t rotorstep = xfs_rotorstep; /* inode32 agf stepper */ mp = args->mp; @@ -2672,7 +2670,6 @@ xfs_alloc_vextent( trace_xfs_alloc_vextent_badargs(args); return 0; } - minleft = args->minleft; switch (type) { case XFS_ALLOCTYPE_THIS_AG: @@ -2683,9 +2680,7 @@ xfs_alloc_vextent( */ args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->pag = xfs_perag_get(mp, args->agno); - args->minleft = 0; error = xfs_alloc_fix_freelist(args, 0); - args->minleft = minleft; if (error) { trace_xfs_alloc_vextent_nofix(args); goto error0; @@ -2750,9 +2745,7 @@ xfs_alloc_vextent( */ for (;;) { args->pag = xfs_perag_get(mp, args->agno); - if (no_min) args->minleft = 0; error = xfs_alloc_fix_freelist(args, flags); - args->minleft = minleft; if (error) { trace_xfs_alloc_vextent_nofix(args); goto error0; @@ -2792,20 +2785,17 @@ xfs_alloc_vextent( * or switch to non-trylock mode. */ if (args->agno == sagno) { - if (no_min == 1) { + if (flags == 0) { args->agbno = NULLAGBLOCK; trace_xfs_alloc_vextent_allfailed(args); break; } - if (flags == 0) { - no_min = 1; - } else { - flags = 0; - if (type == XFS_ALLOCTYPE_START_BNO) { - args->agbno = XFS_FSB_TO_AGBNO(mp, - args->fsbno); - args->type = XFS_ALLOCTYPE_NEAR_BNO; - } + + flags = 0; + if (type == XFS_ALLOCTYPE_START_BNO) { + args->agbno = XFS_FSB_TO_AGBNO(mp, + args->fsbno); + args->type = XFS_ALLOCTYPE_NEAR_BNO; } } xfs_perag_put(args->pag); diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 2760bc3..44773c9 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3812,7 +3812,6 @@ xfs_bmap_btalloc( args.fsbno = 0; args.type = XFS_ALLOCTYPE_FIRST_AG; args.total = ap->minlen; - args.minleft = 0; if ((error = xfs_alloc_vextent(&args))) return error; ap->dfops->dop_low = true; @@ -4344,8 +4343,6 @@ xfs_bmapi_allocate( if (error) return error; - if (bma->dfops->dop_low) - bma->minleft = 0; if (bma->cur) bma->cur->bc_private.b.firstblock = *bma->firstblock; if (bma->blkno == NULLFSBLOCK) diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index d6330c2..9581ee8 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -499,20 +499,6 @@ xfs_bmbt_alloc_block( goto try_another_ag; } - if (args.fsbno == NULLFSBLOCK && args.minleft) { - /* - * Could not find an AG with enough free space to satisfy - * a full btree split. Try again without minleft and if - * successful activate the lowspace algorithm. - */ - args.fsbno = 0; - args.type = XFS_ALLOCTYPE_FIRST_AG; - args.minleft = 0; - error = xfs_alloc_vextent(&args); - if (error) - goto error0; - cur->bc_private.b.dfops->dop_low = true; - } if (args.fsbno == NULLFSBLOCK) { XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); *stat = 0;
We can't just set minleft to 0 when we're low on space - that's exactly what we need minleft for: to protect space in the AG for btree block allocations when we are low on free space. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/libxfs/xfs_alloc.c | 24 +++++++----------------- fs/xfs/libxfs/xfs_bmap.c | 3 --- fs/xfs/libxfs/xfs_bmap_btree.c | 14 -------------- 3 files changed, 7 insertions(+), 34 deletions(-)