diff mbox

tr_ifree transaction log reservation calculation

Message ID 20171125232010.GW5858@dastard (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Chinner Nov. 25, 2017, 11:20 p.m. UTC
On Fri, Nov 24, 2017 at 09:51:22AM -0500, Brian Foster wrote:
> On Fri, Nov 24, 2017 at 08:54:57AM +1100, Dave Chinner wrote:
> > On Thu, Nov 23, 2017 at 09:36:53AM -0500, Brian Foster wrote:
> > > On Thu, Nov 23, 2017 at 11:24:02AM +1100, Dave Chinner wrote:
> > > That sounds reasonable, but at the same time I'm a bit concerned that I
> > > think the ifree transaction should technically be refactored into a max
> > > of the pre/post tx roll reservations similar to the write reservation
> > > mentioned above. For example:
> > > 
> > > 	MAX(inode + inobt + finobt + agi + inode chunk + etc.,
> > > 	    allocfree for inode chunk + (allocfree * nr inobts));
> > 
> > Yeah, I think you're right in that we need to we need to look at
> > redefining the reservations at a higher level.
> > 
> > For this specific example, though, we have to take into account that
> > a finobt modification to mark an inode free can insert a new record
> > into the btree. Hence we have scope for a finobt block allocation
> > and tree split, so we need to have allocfree reservations in the
> > primary transaction.
> > 
> > Also, the inode btree modifications (and refcountbt, too) call
> > xfs_free_extent() directly - they don't use deferops free list that
> > is processed as an EFI/EFD op after the primary transaction has been
> > rolled.  i.e:
> > 
> > xfs_btree_delrec
> >   xfs_btree_free_block
> >     xfs_inobt_free_block
> >       xfs_free_extent
> > 
> > Hence the primary transaction reservation has to take into account
> > that operations that remove a record from the tree can cause and
> > extent to be freed. As such, the removal of an inode chunk - which
> > removes the record from both the inobt and finobt - can trigger
> > freespace tree modification directly.
> > 
> 
> Ah, because the inobt doesn't use the agfl. It looks like we try to do
> this in xfs_calc_finobt_res(), but it doesn't appear to be fully correct
> because the allocfree res is only accounted for transactions that "do
> not already account for free space btree modifications," according to
> the comment. I don't recall the exact reasoning there off the top of my
> head, but if I had to guess, this was probably just another instance of
> "bolt something on to the transaction that appears to follow the pattern
> of other transactions and doesn't explode."
> 
> Given that, it seems like the appropriate thing to do is in fact for the
> transaction to include an independent allocfree res for the inode chunk,
> the inobt and finobt. That justifies the existence of the current
> allocfree in the pre-roll part of the transaction and would increase the
> transaction size by one more allocfree in the finobt case (the inode
> chunk should technically be part of the post-roll calculation).
> 
> Actually, with addition of some proper documentation that may also
> obviate the need to do the whole pre/post tx roll thing because the
> first part of the transaction will obviously be larger than a single
> allocfree res for the second part (amazing how a little documentation to
> establish/assess intent can save us time from trying to work backwards
> from the transaction :P).
> 
> Ok, I think that gives me enough to go on to try and refactor at least
> the inobt tx res bits. I'll take a closer look next week, thanks for all
> of the feedback.

Actually, I wrote a patch as I was working all this out :p

Not complete, or anything like that, but I've attached it below
so you've got some idea of what I was thinking....

> > IOWs, the primary ifree transaction needs alloc/free reservations
> > for the modifications of both the inobt and the finobt, regardless
> > of anything else that goes on, so.....
> > 
> > > But that would end up making the transaction smaller than adding it all
> > > together. Of course, that may be fine with the agfl fixup below..
> > 
> > .... I doubt it will shrink if we take into account the multiple
> > direct alloc/free operations in the primary transaction properly. :/
> > 
> > Hmmmm - I'm wondering if we need to limit the max extents in a
> > defer-ops EFI/EFD item to 2 so we don't try to free all the defered
> > extent frees in a single EFI/EFD transaction pair. i.e. force it
> > to roll the transaction and get a new log reservation every two
> > extents being freed so we can limit the reservation size defered
> > extent freeing requires....
> 
> This sounds similar in principle to the other option I was thinking
> about with regard to dealing with the agfl fixup problem (rolling the
> tx). The difference is the driving logic: putting a hard limit on the
> number of per-tx operations vs. rolling in a particular agfl corner
> case.

Rolling the transaction at the AGFL level is problematic. The
transaction scope is defined way further up the call stack and we
can't roll transactions while we hold random logged objects locked
that need to stay locked across the entire allocation process.

Hence I don't think we can roll the transaction at the point where
we are modifying the AGFL simply because we don't have the context
available to ensure it can be done in a transactionally atomic and
deadlock free manner.

Cheers,

Dave.

Comments

Brian Foster Nov. 27, 2017, 6:46 p.m. UTC | #1
On Sun, Nov 26, 2017 at 10:20:10AM +1100, Dave Chinner wrote:
> On Fri, Nov 24, 2017 at 09:51:22AM -0500, Brian Foster wrote:
> > On Fri, Nov 24, 2017 at 08:54:57AM +1100, Dave Chinner wrote:
> > > On Thu, Nov 23, 2017 at 09:36:53AM -0500, Brian Foster wrote:
> > > > On Thu, Nov 23, 2017 at 11:24:02AM +1100, Dave Chinner wrote:
> > > > That sounds reasonable, but at the same time I'm a bit concerned that I
> > > > think the ifree transaction should technically be refactored into a max
> > > > of the pre/post tx roll reservations similar to the write reservation
> > > > mentioned above. For example:
> > > > 
> > > > 	MAX(inode + inobt + finobt + agi + inode chunk + etc.,
> > > > 	    allocfree for inode chunk + (allocfree * nr inobts));
> > > 
> > > Yeah, I think you're right in that we need to we need to look at
> > > redefining the reservations at a higher level.
> > > 
> > > For this specific example, though, we have to take into account that
> > > a finobt modification to mark an inode free can insert a new record
> > > into the btree. Hence we have scope for a finobt block allocation
> > > and tree split, so we need to have allocfree reservations in the
> > > primary transaction.
> > > 
> > > Also, the inode btree modifications (and refcountbt, too) call
> > > xfs_free_extent() directly - they don't use deferops free list that
> > > is processed as an EFI/EFD op after the primary transaction has been
> > > rolled.  i.e:
> > > 
> > > xfs_btree_delrec
> > >   xfs_btree_free_block
> > >     xfs_inobt_free_block
> > >       xfs_free_extent
> > > 
> > > Hence the primary transaction reservation has to take into account
> > > that operations that remove a record from the tree can cause and
> > > extent to be freed. As such, the removal of an inode chunk - which
> > > removes the record from both the inobt and finobt - can trigger
> > > freespace tree modification directly.
> > > 
> > 
> > Ah, because the inobt doesn't use the agfl. It looks like we try to do
> > this in xfs_calc_finobt_res(), but it doesn't appear to be fully correct
> > because the allocfree res is only accounted for transactions that "do
> > not already account for free space btree modifications," according to
> > the comment. I don't recall the exact reasoning there off the top of my
> > head, but if I had to guess, this was probably just another instance of
> > "bolt something on to the transaction that appears to follow the pattern
> > of other transactions and doesn't explode."
> > 
> > Given that, it seems like the appropriate thing to do is in fact for the
> > transaction to include an independent allocfree res for the inode chunk,
> > the inobt and finobt. That justifies the existence of the current
> > allocfree in the pre-roll part of the transaction and would increase the
> > transaction size by one more allocfree in the finobt case (the inode
> > chunk should technically be part of the post-roll calculation).
> > 
> > Actually, with addition of some proper documentation that may also
> > obviate the need to do the whole pre/post tx roll thing because the
> > first part of the transaction will obviously be larger than a single
> > allocfree res for the second part (amazing how a little documentation to
> > establish/assess intent can save us time from trying to work backwards
> > from the transaction :P).
> > 
> > Ok, I think that gives me enough to go on to try and refactor at least
> > the inobt tx res bits. I'll take a closer look next week, thanks for all
> > of the feedback.
> 
> Actually, I wrote a patch as I was working all this out :p
> 
> Not complete, or anything like that, but I've attached it below
> so you've got some idea of what I was thinking....
> 

Ok. While I get the basic idea, this isn't quite what I was thinking
with regard to the finobt bits. The current finobt reservation applies
the allocation part of the logic only when the higher level transaction
doesn't already include the allocfree res. This kind of handwaves away
the finobt allocfree res by assuming the existing transaction
reservation would cover the blocks alloc'd/freed by the finobt.

IOW, there isn't really a clear separation between the modify and
allocation cases for the finobt as there is for the inobt. The finobt
can add/remove a record in either of those cases. The requirement to add
an additional allocfree res with the finobt operation pretty much
invalidates the finobt modify case, I think.

Also, it looks like this makes some other changes that are not
immediately clear to me, squashes in a couple of the fixes I've already
made and doesn't actually add a new allocfree res in some of the
create/alloc calculations (but rather refactors the existing allocfree
res to use the new helper). It's not clear to me if the latter is
intentional..?

In any event, I'd like to keep the previous bugfixes separate from the
refactoring so I will post what I was thinking with regard to the
refactoring bits on top of the bugfix series. If I'm missing something
or you had a different approach, I'd appreciate if we could work off the
bugfixes as a baseline..

> > > IOWs, the primary ifree transaction needs alloc/free reservations
> > > for the modifications of both the inobt and the finobt, regardless
> > > of anything else that goes on, so.....
> > > 
> > > > But that would end up making the transaction smaller than adding it all
> > > > together. Of course, that may be fine with the agfl fixup below..
> > > 
> > > .... I doubt it will shrink if we take into account the multiple
> > > direct alloc/free operations in the primary transaction properly. :/
> > > 
> > > Hmmmm - I'm wondering if we need to limit the max extents in a
> > > defer-ops EFI/EFD item to 2 so we don't try to free all the defered
> > > extent frees in a single EFI/EFD transaction pair. i.e. force it
> > > to roll the transaction and get a new log reservation every two
> > > extents being freed so we can limit the reservation size defered
> > > extent freeing requires....
> > 
> > This sounds similar in principle to the other option I was thinking
> > about with regard to dealing with the agfl fixup problem (rolling the
> > tx). The difference is the driving logic: putting a hard limit on the
> > number of per-tx operations vs. rolling in a particular agfl corner
> > case.
> 
> Rolling the transaction at the AGFL level is problematic. The
> transaction scope is defined way further up the call stack and we
> can't roll transactions while we hold random logged objects locked
> that need to stay locked across the entire allocation process.
> 
> Hence I don't think we can roll the transaction at the point where
> we are modifying the AGFL simply because we don't have the context
> available to ensure it can be done in a transactionally atomic and
> deadlock free manner.
> 

Yes, the idea wasn't to explicitly roll the transaction down in the AGFL
code. Rather, refactor the code to allow the deferred ops infrastructure
to do so (so we relog the intent as well). This would most likely
require some ugly factoring to deal with the layering.

No matter, I think a hard restriction of a couple frees per transaction
is much more straightforward and thus a better idea.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> 
> xfs: inode transaciton reservations are whacky
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> Start hacking cleanups into the code, see if we can untangle the
> mess.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_trans_resv.c | 128 +++++++++++++++++++++++++----------------
>  1 file changed, 77 insertions(+), 51 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index 6bd916bd35e2..60d25d353fc0 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -131,6 +131,40 @@ xfs_calc_inode_res(
>  		 2 * XFS_BMBT_BLOCK_LEN(mp));
>  }
>  
> +/*
> + * Allocation/freeing of a [f]inobt record requires:
> + *
> + * the inode btree: max depth * block size
> + * the allocation btrees: 2 trees * (max depth - 1) * block size
> + *
> + * This assumes that AGI/AGF/SB modifications are accounted for by the caller.
> + * Modification is assumed as the first block in the tree depth reservation.
> + */
> +static uint
> +xfs_calc_inobt_alloc_res(
> +	struct xfs_mount	*mp)
> +{
> +	return xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
> +	       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> +				XFS_FSB_TO_B(mp, 1));
> +}
> +
> +/*
> + * Modification of a [f]inobt record requires:
> + *
> + * the inode btree: max depth * block size
> + * the inode btree entry: block size
> + *
> + * This assumes that AGI/AGF/SB modifications are accounted for by the caller.
> + */
> +static uint
> +xfs_calc_inobt_modify_res(
> +	struct xfs_mount	*mp)
> +{
> +	return xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
> +	       (uint)XFS_FSB_TO_B(mp, 1);
> +}
> +
>  /*
>   * The free inode btree is a conditional feature and the log reservation
>   * requirements differ slightly from that of the traditional inode allocation
> @@ -146,30 +180,19 @@ xfs_calc_inode_res(
>   * 'alloc' param indicates to include the reservation for free space btree
>   * modifications on behalf of finobt modifications. This is required only for
>   * transactions that do not already account for free space btree modifications.
> - *
> - * the free inode btree: max depth * block size
> - * the allocation btrees: 2 trees * (max depth - 1) * block size
> - * the free inode btree entry: block size
>   */
>  STATIC uint
>  xfs_calc_finobt_res(
>  	struct xfs_mount	*mp,
> -	int			alloc,
> -	int			modify)
> +	bool			alloc)
>  {
> -	uint res;
> -
>  	if (!xfs_sb_version_hasfinobt(&mp->m_sb))
>  		return 0;
>  
> -	res = xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1));
> +	/* alloc implies modification */
>  	if (alloc)
> -		res += xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -					XFS_FSB_TO_B(mp, 1));
> -	if (modify)
> -		res += (uint)XFS_FSB_TO_B(mp, 1);
> -
> -	return res;
> +		return xfs_calc_inobt_alloc_res(mp);
> +	return xfs_calc_inobt_modify_res(mp);
>  }
>
>  /*
> @@ -232,8 +255,6 @@ xfs_calc_write_reservation(
>   *    the super block to reflect the freed blocks: sector size
>   *    worst case split in allocation btrees per extent assuming 4 extents:
>   *		4 exts * 2 trees * (2 * max depth - 1) * block size
> - *    the inode btree: max depth * blocksize
> - *    the allocation btrees: 2 trees * (max depth - 1) * block size
>   */
>  STATIC uint
>  xfs_calc_itruncate_reservation(
> @@ -245,12 +266,7 @@ xfs_calc_itruncate_reservation(
>  				      XFS_FSB_TO_B(mp, 1))),
>  		    (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
>  		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
> -				      XFS_FSB_TO_B(mp, 1)) +
> -		    xfs_calc_buf_res(5, 0) +
> -		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				     XFS_FSB_TO_B(mp, 1)) +
> -		    xfs_calc_buf_res(2 + mp->m_ialloc_blks +
> -				     mp->m_in_maxlevels, 0)));
> +				      XFS_FSB_TO_B(mp, 1))));
>  }
>  
>  /*
> @@ -320,13 +336,13 @@ xfs_calc_link_reservation(
>  /*
>   * For adding an inode to unlinked list we can modify:
>   *    the agi hash list: sector size
> - *    the unlinked inode: inode size
> + *    the on-disk unlinked inode: inode cluster size
>   */
>  STATIC uint
>  xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
>  {
>  	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		xfs_calc_inode_res(mp, 1);
> +	       max_t(uint, XFS_FSB_TO_B(mp, 1), mp->m_inode_cluster_size);
>  }
>  
>  /*
> @@ -367,24 +383,26 @@ xfs_calc_remove_reservation(
>   *    the new inode: inode size
>   *    the inode btree entry: block size
>   *    the superblock for the nlink flag: sector size
> + *    the AGI for inode counters: sector size
>   *    the directory btree: (max depth + v2) * dir block size
>   *    the directory inode's bmap btree: (max depth + v2) * block size
> - *    the finobt (record modification and allocation btrees)
> + *    the finobt (record modification/removal and allocation btrees)
> + *    the AGF, AGFL for finobt allocation: 2 * sector size
>   */
>  STATIC uint
>  xfs_calc_create_resv_modify(
>  	struct xfs_mount	*mp)
>  {
>  	return xfs_calc_inode_res(mp, 2) +
> -		xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		(uint)XFS_FSB_TO_B(mp, 1) +
> +		xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
>  		xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)) +
> -		xfs_calc_finobt_res(mp, 1, 1);
> +		xfs_calc_inobt_modify_res(mp) +
> +		xfs_calc_finobt_res(mp, true);
>  }
>  
>  /*
>   * For create we can allocate some inodes giving:
> - *    the agi and agf of the ag getting the new inodes: 2 * sectorsize
> + *    the agi, agf, agfl of the ag getting the new inodes: 3 * sectorsize
>   *    the superblock for the nlink flag: sector size
>   *    the inode blocks allocated: mp->m_ialloc_blks * blocksize
>   *    the inode btree: max depth * blocksize
> @@ -394,12 +412,9 @@ STATIC uint
>  xfs_calc_create_resv_alloc(
>  	struct xfs_mount	*mp)
>  {
> -	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
> -		mp->m_sb.sb_sectsize +
> +	return xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
>  		xfs_calc_buf_res(mp->m_ialloc_blks, XFS_FSB_TO_B(mp, 1)) +
> -		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
> -		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				 XFS_FSB_TO_B(mp, 1));
> +		xfs_calc_inobt_alloc_res(mp);
>  }
>  
>  STATIC uint
> @@ -413,7 +428,7 @@ __xfs_calc_create_reservation(
>  
>  /*
>   * For icreate we can allocate some inodes giving:
> - *    the agi and agf of the ag getting the new inodes: 2 * sectorsize
> + *    the agi, agf, agfl of the ag getting the new inodes: 3 * sectorsize
>   *    the superblock for the nlink flag: sector size
>   *    the inode btree: max depth * blocksize
>   *    the allocation btrees: 2 trees * (max depth - 1) * block size
> @@ -423,12 +438,9 @@ STATIC uint
>  xfs_calc_icreate_resv_alloc(
>  	struct xfs_mount	*mp)
>  {
> -	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
> -		mp->m_sb.sb_sectsize +
> -		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
> -		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				 XFS_FSB_TO_B(mp, 1)) +
> -		xfs_calc_finobt_res(mp, 0, 0);
> +	return xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
> +		xfs_calc_inobt_alloc_res(mp) +
> +		xfs_calc_finobt_res(mp, false);
>  }
>  
>  STATIC uint
> @@ -493,26 +505,40 @@ xfs_calc_symlink_reservation(
>   *    the super block free inode counter: sector size
>   *    the agi hash list and counters: sector size
>   *    the inode btree entry: block size
> + *    the on disk inode to null di_next_unlinked: inode cluster size
>   *    the on disk inode before ours in the agi hash list: inode cluster size
> + *
> + * If we also have to free an inode chunk or recrod
>   *    the inode btree: max depth * blocksize
> + *	Note: this also includes inode btree entry block!
> + *    the AGF and AGFL: 2 sectors
> + *    the allocation btrees: 2 trees * (max depth - 1) * block size
> + *    the log headers to record the inode chunk buffers as freed
> + *
> + * If we also have to modify the finbot - record may need allocation:
> + *    the finobt: max depth * block size
>   *    the allocation btrees: 2 trees * (max depth - 1) * block size
> - *    the finobt (record insertion, removal or modification)
> + *
>   */
>  STATIC uint
>  xfs_calc_ifree_reservation(
>  	struct xfs_mount	*mp)
>  {
>  	return XFS_DQUOT_LOGRES(mp) +
> +		/* inode being freed */
>  		xfs_calc_inode_res(mp, 1) +
> -		xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
> +		/* sb, AGF, AGFL */
> +		xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
> +		/* AGI + inode cluster */
>  		xfs_calc_iunlink_remove_reservation(mp) +
> -		xfs_calc_buf_res(1, 0) +
> -		xfs_calc_buf_res(2 + mp->m_ialloc_blks +
> -				 mp->m_in_maxlevels, 0) +
> -		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				 XFS_FSB_TO_B(mp, 1)) +
> -		xfs_calc_finobt_res(mp, 0, 1);
> +		/* inode cluster on disk before ours */
> +		xfs_calc_iunlink_remove_reservation(mp) +
> +		/* inode btree record freeing */
> +		xfs_calc_inobt_alloc_res(mp) +
> +		/* stale inode chunk buffers (blf & log hdrs only) */
> +		xfs_calc_buf_res(mp->m_ialloc_blks, 0) +
> +		/* free inode btree */
> +		xfs_calc_finobt_res(mp, true);
>  }
>  
>  /*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Nov. 27, 2017, 9:29 p.m. UTC | #2
On Mon, Nov 27, 2017 at 01:46:51PM -0500, Brian Foster wrote:
> On Sun, Nov 26, 2017 at 10:20:10AM +1100, Dave Chinner wrote:
> > On Fri, Nov 24, 2017 at 09:51:22AM -0500, Brian Foster wrote:
> > > On Fri, Nov 24, 2017 at 08:54:57AM +1100, Dave Chinner wrote:
> > > > On Thu, Nov 23, 2017 at 09:36:53AM -0500, Brian Foster wrote:
> > > > > On Thu, Nov 23, 2017 at 11:24:02AM +1100, Dave Chinner wrote:
> > > > > That sounds reasonable, but at the same time I'm a bit concerned that I
> > > > > think the ifree transaction should technically be refactored into a max
> > > > > of the pre/post tx roll reservations similar to the write reservation
> > > > > mentioned above. For example:
> > > > > 
> > > > > 	MAX(inode + inobt + finobt + agi + inode chunk + etc.,
> > > > > 	    allocfree for inode chunk + (allocfree * nr inobts));
> > > > 
> > > > Yeah, I think you're right in that we need to we need to look at
> > > > redefining the reservations at a higher level.
> > > > 
> > > > For this specific example, though, we have to take into account that
> > > > a finobt modification to mark an inode free can insert a new record
> > > > into the btree. Hence we have scope for a finobt block allocation
> > > > and tree split, so we need to have allocfree reservations in the
> > > > primary transaction.
> > > > 
> > > > Also, the inode btree modifications (and refcountbt, too) call
> > > > xfs_free_extent() directly - they don't use deferops free list that
> > > > is processed as an EFI/EFD op after the primary transaction has been
> > > > rolled.  i.e:
> > > > 
> > > > xfs_btree_delrec
> > > >   xfs_btree_free_block
> > > >     xfs_inobt_free_block
> > > >       xfs_free_extent
> > > > 
> > > > Hence the primary transaction reservation has to take into account
> > > > that operations that remove a record from the tree can cause and
> > > > extent to be freed. As such, the removal of an inode chunk - which
> > > > removes the record from both the inobt and finobt - can trigger
> > > > freespace tree modification directly.
> > > > 
> > > 
> > > Ah, because the inobt doesn't use the agfl. It looks like we try to do
> > > this in xfs_calc_finobt_res(), but it doesn't appear to be fully correct
> > > because the allocfree res is only accounted for transactions that "do
> > > not already account for free space btree modifications," according to
> > > the comment. I don't recall the exact reasoning there off the top of my
> > > head, but if I had to guess, this was probably just another instance of
> > > "bolt something on to the transaction that appears to follow the pattern
> > > of other transactions and doesn't explode."
> > > 
> > > Given that, it seems like the appropriate thing to do is in fact for the
> > > transaction to include an independent allocfree res for the inode chunk,
> > > the inobt and finobt. That justifies the existence of the current
> > > allocfree in the pre-roll part of the transaction and would increase the
> > > transaction size by one more allocfree in the finobt case (the inode
> > > chunk should technically be part of the post-roll calculation).
> > > 
> > > Actually, with addition of some proper documentation that may also
> > > obviate the need to do the whole pre/post tx roll thing because the
> > > first part of the transaction will obviously be larger than a single
> > > allocfree res for the second part (amazing how a little documentation to
> > > establish/assess intent can save us time from trying to work backwards
> > > from the transaction :P).
> > > 
> > > Ok, I think that gives me enough to go on to try and refactor at least
> > > the inobt tx res bits. I'll take a closer look next week, thanks for all
> > > of the feedback.
> > 
> > Actually, I wrote a patch as I was working all this out :p
> > 
> > Not complete, or anything like that, but I've attached it below
> > so you've got some idea of what I was thinking....
> > 
> 
> Ok. While I get the basic idea, this isn't quite what I was thinking
> with regard to the finobt bits. The current finobt reservation applies
> the allocation part of the logic only when the higher level transaction
> doesn't already include the allocfree res.

Which is wrong, as we've already discussed. If we are doing an
allocation of [f]inobt blocks, we need an allocfree reservation for
that specific operation.

> This kind of handwaves away
> the finobt allocfree res by assuming the existing transaction
> reservation would cover the blocks alloc'd/freed by the finobt.
>
> IOW, there isn't really a clear separation between the modify and
> allocation cases for the finobt as there is for the inobt. The finobt
> can add/remove a record in either of those cases. The requirement to add
> an additional allocfree res with the finobt operation pretty much
> invalidates the finobt modify case, I think.

Sure. What I did was separate allocation vs no allocation
(modification) in terms of what the tree operation requires. But
that doesn't mean we actually use the modification variant for
finobt if all operations we take reservations for could require
allocation.

It's the difference between "this is the reservation for a pure
record modification operation" vs "this is the reservation for a
record modification operation that *may* require record insertion or
deletion and hence allocation/freeing"

> Also, it looks like this makes some other changes that are not
> immediately clear to me,

Which ones?

> squashes in a couple of the fixes I've already
> made and doesn't actually add a new allocfree res in some of the
> create/alloc calculations (but rather refactors the existing allocfree
> res to use the new helper). It's not clear to me if the latter is
> intentional..?

Not sure what you are asking? I factored existing open-coded inobt
reservations to use helpers so that I didn't have to modify the
reservation in lots of places. Get it right in one place, all the
others are correct too. Indeed, most of the cases I factored already
had the allocfree reservation, so I'm not sure why we'd be adding a
new one to those reservations?

And, keep in mind that I did say "the patch is just what I wrote as
I discovered problems" before tearing into it about not being
complete a complete solution....

Cheers,

Dave.
Brian Foster Nov. 28, 2017, 1:28 p.m. UTC | #3
On Tue, Nov 28, 2017 at 08:29:46AM +1100, Dave Chinner wrote:
> On Mon, Nov 27, 2017 at 01:46:51PM -0500, Brian Foster wrote:
> > On Sun, Nov 26, 2017 at 10:20:10AM +1100, Dave Chinner wrote:
> > > On Fri, Nov 24, 2017 at 09:51:22AM -0500, Brian Foster wrote:
> > > > On Fri, Nov 24, 2017 at 08:54:57AM +1100, Dave Chinner wrote:
> > > > > On Thu, Nov 23, 2017 at 09:36:53AM -0500, Brian Foster wrote:
> > > > > > On Thu, Nov 23, 2017 at 11:24:02AM +1100, Dave Chinner wrote:
> > > > > > That sounds reasonable, but at the same time I'm a bit concerned that I
> > > > > > think the ifree transaction should technically be refactored into a max
> > > > > > of the pre/post tx roll reservations similar to the write reservation
> > > > > > mentioned above. For example:
> > > > > > 
> > > > > > 	MAX(inode + inobt + finobt + agi + inode chunk + etc.,
> > > > > > 	    allocfree for inode chunk + (allocfree * nr inobts));
> > > > > 
> > > > > Yeah, I think you're right in that we need to we need to look at
> > > > > redefining the reservations at a higher level.
> > > > > 
> > > > > For this specific example, though, we have to take into account that
> > > > > a finobt modification to mark an inode free can insert a new record
> > > > > into the btree. Hence we have scope for a finobt block allocation
> > > > > and tree split, so we need to have allocfree reservations in the
> > > > > primary transaction.
> > > > > 
> > > > > Also, the inode btree modifications (and refcountbt, too) call
> > > > > xfs_free_extent() directly - they don't use deferops free list that
> > > > > is processed as an EFI/EFD op after the primary transaction has been
> > > > > rolled.  i.e:
> > > > > 
> > > > > xfs_btree_delrec
> > > > >   xfs_btree_free_block
> > > > >     xfs_inobt_free_block
> > > > >       xfs_free_extent
> > > > > 
> > > > > Hence the primary transaction reservation has to take into account
> > > > > that operations that remove a record from the tree can cause and
> > > > > extent to be freed. As such, the removal of an inode chunk - which
> > > > > removes the record from both the inobt and finobt - can trigger
> > > > > freespace tree modification directly.
> > > > > 
> > > > 
> > > > Ah, because the inobt doesn't use the agfl. It looks like we try to do
> > > > this in xfs_calc_finobt_res(), but it doesn't appear to be fully correct
> > > > because the allocfree res is only accounted for transactions that "do
> > > > not already account for free space btree modifications," according to
> > > > the comment. I don't recall the exact reasoning there off the top of my
> > > > head, but if I had to guess, this was probably just another instance of
> > > > "bolt something on to the transaction that appears to follow the pattern
> > > > of other transactions and doesn't explode."
> > > > 
> > > > Given that, it seems like the appropriate thing to do is in fact for the
> > > > transaction to include an independent allocfree res for the inode chunk,
> > > > the inobt and finobt. That justifies the existence of the current
> > > > allocfree in the pre-roll part of the transaction and would increase the
> > > > transaction size by one more allocfree in the finobt case (the inode
> > > > chunk should technically be part of the post-roll calculation).
> > > > 
> > > > Actually, with addition of some proper documentation that may also
> > > > obviate the need to do the whole pre/post tx roll thing because the
> > > > first part of the transaction will obviously be larger than a single
> > > > allocfree res for the second part (amazing how a little documentation to
> > > > establish/assess intent can save us time from trying to work backwards
> > > > from the transaction :P).
> > > > 
> > > > Ok, I think that gives me enough to go on to try and refactor at least
> > > > the inobt tx res bits. I'll take a closer look next week, thanks for all
> > > > of the feedback.
> > > 
> > > Actually, I wrote a patch as I was working all this out :p
> > > 
> > > Not complete, or anything like that, but I've attached it below
> > > so you've got some idea of what I was thinking....
> > > 
> > 
> > Ok. While I get the basic idea, this isn't quite what I was thinking
> > with regard to the finobt bits. The current finobt reservation applies
> > the allocation part of the logic only when the higher level transaction
> > doesn't already include the allocfree res.
> 
> Which is wrong, as we've already discussed. If we are doing an
> allocation of [f]inobt blocks, we need an allocfree reservation for
> that specific operation.
> 

Yep, just restating the obvious..

> > This kind of handwaves away
> > the finobt allocfree res by assuming the existing transaction
> > reservation would cover the blocks alloc'd/freed by the finobt.
> >
> > IOW, there isn't really a clear separation between the modify and
> > allocation cases for the finobt as there is for the inobt. The finobt
> > can add/remove a record in either of those cases. The requirement to add
> > an additional allocfree res with the finobt operation pretty much
> > invalidates the finobt modify case, I think.
> 
> Sure. What I did was separate allocation vs no allocation
> (modification) in terms of what the tree operation requires. But
> that doesn't mean we actually use the modification variant for
> finobt if all operations we take reservations for could require
> allocation.
> 
> It's the difference between "this is the reservation for a pure
> record modification operation" vs "this is the reservation for a
> record modification operation that *may* require record insertion or
> deletion and hence allocation/freeing"
> 

Ok, I just think it's cleaner if that concept is wiped away wrt to
finobt. It's confusing to me, at least, and I probably wrote the code.
:P

> > Also, it looks like this makes some other changes that are not
> > immediately clear to me,
> 
> Which ones?
> 

The truncate and iunlink bits...

> > squashes in a couple of the fixes I've already
> > made and doesn't actually add a new allocfree res in some of the
> > create/alloc calculations (but rather refactors the existing allocfree
> > res to use the new helper). It's not clear to me if the latter is
> > intentional..?
> 
> Not sure what you are asking? I factored existing open-coded inobt
> reservations to use helpers so that I didn't have to modify the
> reservation in lots of places. Get it right in one place, all the
> others are correct too. Indeed, most of the cases I factored already
> had the allocfree reservation, so I'm not sure why we'd be adding a
> new one to those reservations?
> 

Understood wrt to the helpers. What I'm asking is why
xfs_calc_create_resv_alloc(), for example, doesn't actually add another
allocfree res if the intent was to add one for all inode tree related
ops..? Using that example, my understanding is that tx should now have
an allocfree res for the inode chunk allocation and a new one for the
inobt insertion. The latter is the one that's now abstracted into the
helpers, yes..? That's the model I tried to follow for the rfc patch,
anyways.

> And, keep in mind that I did say "the patch is just what I wrote as
> I discovered problems" before tearing into it about not being
> complete a complete solution....
> 

Yeah, I didn't look too hard beyond trying to understand intent.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Nov. 28, 2017, 9:34 p.m. UTC | #4
On Tue, Nov 28, 2017 at 08:28:02AM -0500, Brian Foster wrote:
> On Tue, Nov 28, 2017 at 08:29:46AM +1100, Dave Chinner wrote:
> > On Mon, Nov 27, 2017 at 01:46:51PM -0500, Brian Foster wrote:
> > > Also, it looks like this makes some other changes that are not
> > > immediately clear to me,
> > 
> > Which ones?
> > 
> 
> The truncate and iunlink bits...

The truncate reservation includes a reservation for an inobt
modification. We *never* modify the inobt inside a truncate
transaction, so even if we ignore the fact it was wrong (like all
the others we are fixing), it really shouldn't be there.

THe commit message from the archive tree doesn't help explain it,
either. It just says (paraphrasing) "fix transaction reservation".
So I removed it.

As for the iunlink bit, you mean this the fact I added the on disk
inode cluster to the reservation in
xfs_calc_iunlink_add_reservation()?

That's because we log the inode cluster in unlink when we modify the
inode unlinked list. That's missing from all the unlinked inode
manipulation reservations - some of them take into account modifying
the inode cluster of another inode in the unlinked list (e.g. ifree)
but they all fail to reserve space for logging the unlinked list
pointer in the inode being unlinked/freed.

> > > squashes in a couple of the fixes I've already
> > > made and doesn't actually add a new allocfree res in some of the
> > > create/alloc calculations (but rather refactors the existing allocfree
> > > res to use the new helper). It's not clear to me if the latter is
> > > intentional..?
> > 
> > Not sure what you are asking? I factored existing open-coded inobt
> > reservations to use helpers so that I didn't have to modify the
> > reservation in lots of places. Get it right in one place, all the
> > others are correct too. Indeed, most of the cases I factored already
> > had the allocfree reservation, so I'm not sure why we'd be adding a
> > new one to those reservations?
> > 
> 
> Understood wrt to the helpers. What I'm asking is why
> xfs_calc_create_resv_alloc(), for example, doesn't actually add another
> allocfree res if the intent was to add one for all inode tree related
> ops..?

Because the patch wasn't complete and finished and there's no
guarantee I factored it correctly. You're still reading that patch
as though it was a completely polished, finished patch, not a set of
"noticed these things while thinking about the problem" notes.

I factored the code first, never went back the create
code because we were focussed on the problems in the iunlink
reservations....

> Using that example, my understanding is that tx should now have
> an allocfree res for the inode chunk allocation and a new one for the
> inobt insertion. The latter is the one that's now abstracted into the
> helpers, yes..? That's the model I tried to follow for the rfc patch,
> anyways.

Sure, that's what needs to be done, as we've discussed.

Cheers,

Dave.
Brian Foster Nov. 29, 2017, 2:31 p.m. UTC | #5
On Wed, Nov 29, 2017 at 08:34:08AM +1100, Dave Chinner wrote:
> On Tue, Nov 28, 2017 at 08:28:02AM -0500, Brian Foster wrote:
> > On Tue, Nov 28, 2017 at 08:29:46AM +1100, Dave Chinner wrote:
> > > On Mon, Nov 27, 2017 at 01:46:51PM -0500, Brian Foster wrote:
> > > > Also, it looks like this makes some other changes that are not
> > > > immediately clear to me,
> > > 
> > > Which ones?
> > > 
> > 
> > The truncate and iunlink bits...
> 
> The truncate reservation includes a reservation for an inobt
> modification. We *never* modify the inobt inside a truncate
> transaction, so even if we ignore the fact it was wrong (like all
> the others we are fixing), it really shouldn't be there.
> 
> THe commit message from the archive tree doesn't help explain it,
> either. It just says (paraphrasing) "fix transaction reservation".
> So I removed it.
> 

Ok..

> As for the iunlink bit, you mean this the fact I added the on disk
> inode cluster to the reservation in
> xfs_calc_iunlink_add_reservation()?
> 
> That's because we log the inode cluster in unlink when we modify the
> inode unlinked list. That's missing from all the unlinked inode
> manipulation reservations - some of them take into account modifying
> the inode cluster of another inode in the unlinked list (e.g. ifree)
> but they all fail to reserve space for logging the unlinked list
> pointer in the inode being unlinked/freed.
> 

This is clear to me now via the other thread. I'll incorporate both of
these fixes, thanks.

> > > > squashes in a couple of the fixes I've already
> > > > made and doesn't actually add a new allocfree res in some of the
> > > > create/alloc calculations (but rather refactors the existing allocfree
> > > > res to use the new helper). It's not clear to me if the latter is
> > > > intentional..?
> > > 
> > > Not sure what you are asking? I factored existing open-coded inobt
> > > reservations to use helpers so that I didn't have to modify the
> > > reservation in lots of places. Get it right in one place, all the
> > > others are correct too. Indeed, most of the cases I factored already
> > > had the allocfree reservation, so I'm not sure why we'd be adding a
> > > new one to those reservations?
> > > 
> > 
> > Understood wrt to the helpers. What I'm asking is why
> > xfs_calc_create_resv_alloc(), for example, doesn't actually add another
> > allocfree res if the intent was to add one for all inode tree related
> > ops..?
> 
> Because the patch wasn't complete and finished and there's no
> guarantee I factored it correctly. You're still reading that patch
> as though it was a completely polished, finished patch, not a set of
> "noticed these things while thinking about the problem" notes.
> 

Nope.. I skimmed it and was confused about a hunk that was reworked in a
way that conflicted with my understanding of the intent. It's really
that simple. I appreciate the psychological evaluation, but I think "no
that's not intentional, it's just a bug/incomplete" would have sufficed.
;)

Brian

> I factored the code first, never went back the create
> code because we were focussed on the problems in the iunlink
> reservations....
> 
> > Using that example, my understanding is that tx should now have
> > an allocfree res for the inode chunk allocation and a new one for the
> > inobt insertion. The latter is the one that's now abstracted into the
> > helpers, yes..? That's the model I tried to follow for the rfc patch,
> > anyways.
> 
> Sure, that's what needs to be done, as we've discussed.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6bd916bd35e2..60d25d353fc0 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -131,6 +131,40 @@  xfs_calc_inode_res(
 		 2 * XFS_BMBT_BLOCK_LEN(mp));
 }
 
+/*
+ * Allocation/freeing of a [f]inobt record requires:
+ *
+ * the inode btree: max depth * block size
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ *
+ * This assumes that AGI/AGF/SB modifications are accounted for by the caller.
+ * Modification is assumed as the first block in the tree depth reservation.
+ */
+static uint
+xfs_calc_inobt_alloc_res(
+	struct xfs_mount	*mp)
+{
+	return xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+	       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
+				XFS_FSB_TO_B(mp, 1));
+}
+
+/*
+ * Modification of a [f]inobt record requires:
+ *
+ * the inode btree: max depth * block size
+ * the inode btree entry: block size
+ *
+ * This assumes that AGI/AGF/SB modifications are accounted for by the caller.
+ */
+static uint
+xfs_calc_inobt_modify_res(
+	struct xfs_mount	*mp)
+{
+	return xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
+	       (uint)XFS_FSB_TO_B(mp, 1);
+}
+
 /*
  * The free inode btree is a conditional feature and the log reservation
  * requirements differ slightly from that of the traditional inode allocation
@@ -146,30 +180,19 @@  xfs_calc_inode_res(
  * 'alloc' param indicates to include the reservation for free space btree
  * modifications on behalf of finobt modifications. This is required only for
  * transactions that do not already account for free space btree modifications.
- *
- * the free inode btree: max depth * block size
- * the allocation btrees: 2 trees * (max depth - 1) * block size
- * the free inode btree entry: block size
  */
 STATIC uint
 xfs_calc_finobt_res(
 	struct xfs_mount	*mp,
-	int			alloc,
-	int			modify)
+	bool			alloc)
 {
-	uint res;
-
 	if (!xfs_sb_version_hasfinobt(&mp->m_sb))
 		return 0;
 
-	res = xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1));
+	/* alloc implies modification */
 	if (alloc)
-		res += xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-					XFS_FSB_TO_B(mp, 1));
-	if (modify)
-		res += (uint)XFS_FSB_TO_B(mp, 1);
-
-	return res;
+		return xfs_calc_inobt_alloc_res(mp);
+	return xfs_calc_inobt_modify_res(mp);
 }
 
 /*
@@ -232,8 +255,6 @@  xfs_calc_write_reservation(
  *    the super block to reflect the freed blocks: sector size
  *    worst case split in allocation btrees per extent assuming 4 extents:
  *		4 exts * 2 trees * (2 * max depth - 1) * block size
- *    the inode btree: max depth * blocksize
- *    the allocation btrees: 2 trees * (max depth - 1) * block size
  */
 STATIC uint
 xfs_calc_itruncate_reservation(
@@ -245,12 +266,7 @@  xfs_calc_itruncate_reservation(
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
 		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
-				      XFS_FSB_TO_B(mp, 1)) +
-		    xfs_calc_buf_res(5, 0) +
-		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				     XFS_FSB_TO_B(mp, 1)) +
-		    xfs_calc_buf_res(2 + mp->m_ialloc_blks +
-				     mp->m_in_maxlevels, 0)));
+				      XFS_FSB_TO_B(mp, 1))));
 }
 
 /*
@@ -320,13 +336,13 @@  xfs_calc_link_reservation(
 /*
  * For adding an inode to unlinked list we can modify:
  *    the agi hash list: sector size
- *    the unlinked inode: inode size
+ *    the on-disk unlinked inode: inode cluster size
  */
 STATIC uint
 xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
 {
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		xfs_calc_inode_res(mp, 1);
+	       max_t(uint, XFS_FSB_TO_B(mp, 1), mp->m_inode_cluster_size);
 }
 
 /*
@@ -367,24 +383,26 @@  xfs_calc_remove_reservation(
  *    the new inode: inode size
  *    the inode btree entry: block size
  *    the superblock for the nlink flag: sector size
+ *    the AGI for inode counters: sector size
  *    the directory btree: (max depth + v2) * dir block size
  *    the directory inode's bmap btree: (max depth + v2) * block size
- *    the finobt (record modification and allocation btrees)
+ *    the finobt (record modification/removal and allocation btrees)
+ *    the AGF, AGFL for finobt allocation: 2 * sector size
  */
 STATIC uint
 xfs_calc_create_resv_modify(
 	struct xfs_mount	*mp)
 {
 	return xfs_calc_inode_res(mp, 2) +
-		xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		(uint)XFS_FSB_TO_B(mp, 1) +
+		xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
 		xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_finobt_res(mp, 1, 1);
+		xfs_calc_inobt_modify_res(mp) +
+		xfs_calc_finobt_res(mp, true);
 }
 
 /*
  * For create we can allocate some inodes giving:
- *    the agi and agf of the ag getting the new inodes: 2 * sectorsize
+ *    the agi, agf, agfl of the ag getting the new inodes: 3 * sectorsize
  *    the superblock for the nlink flag: sector size
  *    the inode blocks allocated: mp->m_ialloc_blks * blocksize
  *    the inode btree: max depth * blocksize
@@ -394,12 +412,9 @@  STATIC uint
 xfs_calc_create_resv_alloc(
 	struct xfs_mount	*mp)
 {
-	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
-		mp->m_sb.sb_sectsize +
+	return xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
 		xfs_calc_buf_res(mp->m_ialloc_blks, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1));
+		xfs_calc_inobt_alloc_res(mp);
 }
 
 STATIC uint
@@ -413,7 +428,7 @@  __xfs_calc_create_reservation(
 
 /*
  * For icreate we can allocate some inodes giving:
- *    the agi and agf of the ag getting the new inodes: 2 * sectorsize
+ *    the agi, agf, agfl of the ag getting the new inodes: 3 * sectorsize
  *    the superblock for the nlink flag: sector size
  *    the inode btree: max depth * blocksize
  *    the allocation btrees: 2 trees * (max depth - 1) * block size
@@ -423,12 +438,9 @@  STATIC uint
 xfs_calc_icreate_resv_alloc(
 	struct xfs_mount	*mp)
 {
-	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
-		mp->m_sb.sb_sectsize +
-		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_finobt_res(mp, 0, 0);
+	return xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
+		xfs_calc_inobt_alloc_res(mp) +
+		xfs_calc_finobt_res(mp, false);
 }
 
 STATIC uint
@@ -493,26 +505,40 @@  xfs_calc_symlink_reservation(
  *    the super block free inode counter: sector size
  *    the agi hash list and counters: sector size
  *    the inode btree entry: block size
+ *    the on disk inode to null di_next_unlinked: inode cluster size
  *    the on disk inode before ours in the agi hash list: inode cluster size
+ *
+ * If we also have to free an inode chunk or recrod
  *    the inode btree: max depth * blocksize
+ *	Note: this also includes inode btree entry block!
+ *    the AGF and AGFL: 2 sectors
+ *    the allocation btrees: 2 trees * (max depth - 1) * block size
+ *    the log headers to record the inode chunk buffers as freed
+ *
+ * If we also have to modify the finbot - record may need allocation:
+ *    the finobt: max depth * block size
  *    the allocation btrees: 2 trees * (max depth - 1) * block size
- *    the finobt (record insertion, removal or modification)
+ *
  */
 STATIC uint
 xfs_calc_ifree_reservation(
 	struct xfs_mount	*mp)
 {
 	return XFS_DQUOT_LOGRES(mp) +
+		/* inode being freed */
 		xfs_calc_inode_res(mp, 1) +
-		xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
+		/* sb, AGF, AGFL */
+		xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
+		/* AGI + inode cluster */
 		xfs_calc_iunlink_remove_reservation(mp) +
-		xfs_calc_buf_res(1, 0) +
-		xfs_calc_buf_res(2 + mp->m_ialloc_blks +
-				 mp->m_in_maxlevels, 0) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_finobt_res(mp, 0, 1);
+		/* inode cluster on disk before ours */
+		xfs_calc_iunlink_remove_reservation(mp) +
+		/* inode btree record freeing */
+		xfs_calc_inobt_alloc_res(mp) +
+		/* stale inode chunk buffers (blf & log hdrs only) */
+		xfs_calc_buf_res(mp->m_ialloc_blks, 0) +
+		/* free inode btree */
+		xfs_calc_finobt_res(mp, true);
 }
 
 /*