Message ID | 20240219063450.3032254-9-hch@lst.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [1/9] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions | expand |
On Mon, Feb 19, 2024 at 07:34:49AM +0100, Christoph Hellwig wrote: > When xfs_bmap_del_extent_delay has to split an indirect block it tries > to steal blocks from the the part that gets unmapped to increase the > indirect block reservation that now needs to cover for two extents > instead of one. > > This works perfectly fine on the data device, where the data and > indirect blocks come from the same pool. It has no chance of working > when the inode sits on the RT device. To support re-enabling delalloc > for inodes on the RT device, make this behavior conditional on not > beeing for rt extents. For an RT extent try allocate new blocks or > otherwise just give up. > > Note that split of delalloc extents should only happen on writeback > failure, as for other kinds of hole punching we first write back all > data and thus convert the delalloc reservations covering the hole to > a real allocation. > > Note that restoring a quota reservation is always a bit problematic, > but the force flag should take care of it. That is, if we actually > supported quota with the RT volume, which seems to not be the case > at the moment. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/libxfs/xfs_bmap.c | 27 ++++++++++++++++++++++++++- > 1 file changed, 26 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 8a84b7f0b55f38..a137abf435eeba 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -4912,6 +4912,30 @@ xfs_bmap_del_extent_delay( > WARN_ON_ONCE(!got_indlen || !new_indlen); > stolen = xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen, > del->br_blockcount); > + if (isrt && stolen) { > + /* > + * Ugg, we can't just steal reservations from the data > + * blocks as the data blocks come from a different pool. > + * > + * So we have to try to increase out reservations here, > + * and if that fails we have to fail the unmap. To > + * avoid that as much as possible dip into the reserve > + * pool. > + * > + * Note that in theory the user/group/project could > + * be over the quota limit in the meantime, thus we > + * force the quota accounting even if it was over the > + * limit. > + */ > + error = xfs_dec_fdblocks(mp, stolen, true); > + if (error) { > + ip->i_delayed_blks += del->br_blockcount; > + xfs_trans_reserve_quota_nblks(NULL, ip, 0, > + del->br_blockcount, true); > + return error; > + } > + xfs_mod_delalloc(ip, 0, stolen); > + } Ok. If you delay the ip->i_delayed_blks and quota accounting until after the incore extent tree updates are done, this code doesn't need to undo anything and can just return an error. We should also keep in mind that an error here will likely cause a filesystem shutdown when the transaction is canceled.... FWIW, if we are going to do this for rt, we should probably also consider do it for normal delalloc conversion when the indlen reservation runs out due to excessive fragmentation of large extents. Separate patch and all that, but it doesn't really make sense to me to only do this for RT when we know it is also needed in reare cases on non-rt workloads... -Dave.
On Tue, Feb 20, 2024 at 10:47:12AM +1100, Dave Chinner wrote: > Ok. If you delay the ip->i_delayed_blks and quota accounting until > after the incore extent tree updates are done, this code doesn't > need to undo anything and can just return an error. We should also > keep in mind that an error here will likely cause a filesystem > shutdown when the transaction is canceled.... Yes. However (as documented in the commit log), the only place where I think it can actually happen is on a buffered write errors as "real" hole punches always flush delalloc space before. > FWIW, if we are going to do this for rt, we should probably also > consider do it for normal delalloc conversion when the indlen > reservation runs out due to excessive fragmentation of large > extents. Separate patch and all that, but it doesn't really make > sense to me to only do this for RT when we know it is also needed in > reare cases on non-rt workloads... Can it happen for non-RT extents? That would assume the required new indirect block reservation for splitting an extent would have to be larger than the amount we punch out.
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 8a84b7f0b55f38..a137abf435eeba 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4912,6 +4912,30 @@ xfs_bmap_del_extent_delay( WARN_ON_ONCE(!got_indlen || !new_indlen); stolen = xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen, del->br_blockcount); + if (isrt && stolen) { + /* + * Ugg, we can't just steal reservations from the data + * blocks as the data blocks come from a different pool. + * + * So we have to try to increase out reservations here, + * and if that fails we have to fail the unmap. To + * avoid that as much as possible dip into the reserve + * pool. + * + * Note that in theory the user/group/project could + * be over the quota limit in the meantime, thus we + * force the quota accounting even if it was over the + * limit. + */ + error = xfs_dec_fdblocks(mp, stolen, true); + if (error) { + ip->i_delayed_blks += del->br_blockcount; + xfs_trans_reserve_quota_nblks(NULL, ip, 0, + del->br_blockcount, true); + return error; + } + xfs_mod_delalloc(ip, 0, stolen); + } got->br_startblock = nullstartblock((int)got_indlen); @@ -4924,7 +4948,8 @@ xfs_bmap_del_extent_delay( xfs_iext_insert(ip, icur, &new, state); da_new = got_indlen + new_indlen - stolen; - del->br_blockcount -= stolen; + if (!isrt) + del->br_blockcount -= stolen; break; }
When xfs_bmap_del_extent_delay has to split an indirect block it tries to steal blocks from the the part that gets unmapped to increase the indirect block reservation that now needs to cover for two extents instead of one. This works perfectly fine on the data device, where the data and indirect blocks come from the same pool. It has no chance of working when the inode sits on the RT device. To support re-enabling delalloc for inodes on the RT device, make this behavior conditional on not beeing for rt extents. For an RT extent try allocate new blocks or otherwise just give up. Note that split of delalloc extents should only happen on writeback failure, as for other kinds of hole punching we first write back all data and thus convert the delalloc reservations covering the hole to a real allocation. Note that restoring a quota reservation is always a bit problematic, but the force flag should take care of it. That is, if we actually supported quota with the RT volume, which seems to not be the case at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/libxfs/xfs_bmap.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)