diff mbox series

[13/13] xfs: reorder iunlink remove operation in xfs_ifree

Message ID 20200812092556.2567285-14-david@fromorbit.com (mailing list archive)
State Accepted
Headers show
Series xfs: in memory inode unlink log items | expand

Commit Message

Dave Chinner Aug. 12, 2020, 9:25 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

The O_TMPFILE creation implementation creates a specific order of
operations for inode allocation/freeing and unlinked list
modification. Currently both are serialised by the AGI, so the order
doesn't strictly matter as long as the are both in the same
transaction.

However, if we want to move the unlinked list insertions largely
out from under the AGI lock, then we have to be concerned about the
order in which we do unlinked list modification operations.
O_TMPFILE creation tells us this order is inode allocation/free,
then unlinked list modification.

Change xfs_ifree() to use this same ordering on unlinked list
removal. THis way we always guarantee that when we enter the
iunlinked list removal code from this path, we have the already
locked and we don't have to worry about lock nesting AGI reads
inside unlink list locks because it's already locked and attached to
the transaction.

We can do this safely as the inode freeing and unlinked list removal
are done in the same transaction and hence are atomic operations
with resepect to log recovery.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Comments

Gao Xiang Aug. 12, 2020, 11:12 a.m. UTC | #1
On Wed, Aug 12, 2020 at 07:25:56PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The O_TMPFILE creation implementation creates a specific order of
> operations for inode allocation/freeing and unlinked list
> modification. Currently both are serialised by the AGI, so the order
> doesn't strictly matter as long as the are both in the same
> transaction.
> 
> However, if we want to move the unlinked list insertions largely
> out from under the AGI lock, then we have to be concerned about the
> order in which we do unlinked list modification operations.
> O_TMPFILE creation tells us this order is inode allocation/free,
> then unlinked list modification.
> 
> Change xfs_ifree() to use this same ordering on unlinked list
> removal. THis way we always guarantee that when we enter the
> iunlinked list removal code from this path, we have the already
> locked and we don't have to worry about lock nesting AGI reads
> inside unlink list locks because it's already locked and attached to
> the transaction.
> 
> We can do this safely as the inode freeing and unlinked list removal
> are done in the same transaction and hence are atomic operations
> with resepect to log recovery.

Yeah, due to all these constraints, such reorder is much cleaner,
otherwise it needs forcely taking AGI lock in xfs_iunlink_remove()
in advance as what I did in my new v3 ( due to exist AGI lock in
xfs_difree() )...
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git/tree/fs/xfs/xfs_inode.c?h=xfs/iunlink_opt_v3#n2511
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git/commit/fs/xfs/xfs_inode.c?h=xfs/iunlink_opt_v3&id=79a6a18a7f13d12726c2554e2581a56fc473b152

Since the new based patchset is out, I will look into this patchset
and skip sending out my v3 (looks like the previous logging order
issues has been resolved) and directly rebase the rest patches
into v4.

Thanks,
Gao Xiang

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_inode.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index ce128ff12762..7ee778bcde06 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2283,14 +2283,13 @@ xfs_ifree_cluster(
>  }
>  
>  /*
> - * This is called to return an inode to the inode free list.
> - * The inode should already be truncated to 0 length and have
> - * no pages associated with it.  This routine also assumes that
> - * the inode is already a part of the transaction.
> + * This is called to return an inode to the inode free list.  The inode should
> + * already be truncated to 0 length and have no pages associated with it.  This
> + * routine also assumes that the inode is already a part of the transaction.
>   *
> - * The on-disk copy of the inode will have been added to the list
> - * of unlinked inodes in the AGI. We need to remove the inode from
> - * that list atomically with respect to freeing it here.
> + * The on-disk copy of the inode will have been added to the list of unlinked
> + * inodes in the AGI. We need to remove the inode from that list atomically with
> + * respect to freeing it here.
>   */
>  int
>  xfs_ifree(
> @@ -2308,13 +2307,16 @@ xfs_ifree(
>  	ASSERT(ip->i_d.di_nblocks == 0);
>  
>  	/*
> -	 * Pull the on-disk inode from the AGI unlinked list.
> +	 * Free the inode first so that we guarantee that the AGI lock is going
> +	 * to be taken before we remove the inode from the unlinked list. This
> +	 * makes the AGI lock -> unlinked list modification order the same as
> +	 * used in O_TMPFILE creation.
>  	 */
> -	error = xfs_iunlink_remove(tp, ip);
> +	error = xfs_difree(tp, ip->i_ino, &xic);
>  	if (error)
>  		return error;
>  
> -	error = xfs_difree(tp, ip->i_ino, &xic);
> +	error = xfs_iunlink_remove(tp, ip);
>  	if (error)
>  		return error;
>  
> -- 
> 2.26.2.761.g0e0b3e54be
>
Darrick J. Wong Aug. 19, 2020, 12:45 a.m. UTC | #2
On Wed, Aug 12, 2020 at 07:25:56PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The O_TMPFILE creation implementation creates a specific order of
> operations for inode allocation/freeing and unlinked list
> modification. Currently both are serialised by the AGI, so the order
> doesn't strictly matter as long as the are both in the same
> transaction.
> 
> However, if we want to move the unlinked list insertions largely
> out from under the AGI lock, then we have to be concerned about the
> order in which we do unlinked list modification operations.
> O_TMPFILE creation tells us this order is inode allocation/free,
> then unlinked list modification.
> 
> Change xfs_ifree() to use this same ordering on unlinked list
> removal. THis way we always guarantee that when we enter the

"This"...

> iunlinked list removal code from this path, we have the already

"have the already locked" ... what do we have locked?  The AGI?

> locked and we don't have to worry about lock nesting AGI reads
> inside unlink list locks because it's already locked and attached to
> the transaction.
> 
> We can do this safely as the inode freeing and unlinked list removal
> are done in the same transaction and hence are atomic operations
> with resepect to log recovery.

"respect"...

With the commit log edited a bit,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_inode.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index ce128ff12762..7ee778bcde06 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2283,14 +2283,13 @@ xfs_ifree_cluster(
>  }
>  
>  /*
> - * This is called to return an inode to the inode free list.
> - * The inode should already be truncated to 0 length and have
> - * no pages associated with it.  This routine also assumes that
> - * the inode is already a part of the transaction.
> + * This is called to return an inode to the inode free list.  The inode should
> + * already be truncated to 0 length and have no pages associated with it.  This
> + * routine also assumes that the inode is already a part of the transaction.
>   *
> - * The on-disk copy of the inode will have been added to the list
> - * of unlinked inodes in the AGI. We need to remove the inode from
> - * that list atomically with respect to freeing it here.
> + * The on-disk copy of the inode will have been added to the list of unlinked
> + * inodes in the AGI. We need to remove the inode from that list atomically with
> + * respect to freeing it here.
>   */
>  int
>  xfs_ifree(
> @@ -2308,13 +2307,16 @@ xfs_ifree(
>  	ASSERT(ip->i_d.di_nblocks == 0);
>  
>  	/*
> -	 * Pull the on-disk inode from the AGI unlinked list.
> +	 * Free the inode first so that we guarantee that the AGI lock is going
> +	 * to be taken before we remove the inode from the unlinked list. This
> +	 * makes the AGI lock -> unlinked list modification order the same as
> +	 * used in O_TMPFILE creation.
>  	 */
> -	error = xfs_iunlink_remove(tp, ip);
> +	error = xfs_difree(tp, ip->i_ino, &xic);
>  	if (error)
>  		return error;
>  
> -	error = xfs_difree(tp, ip->i_ino, &xic);
> +	error = xfs_iunlink_remove(tp, ip);
>  	if (error)
>  		return error;
>  
> -- 
> 2.26.2.761.g0e0b3e54be
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ce128ff12762..7ee778bcde06 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2283,14 +2283,13 @@  xfs_ifree_cluster(
 }
 
 /*
- * This is called to return an inode to the inode free list.
- * The inode should already be truncated to 0 length and have
- * no pages associated with it.  This routine also assumes that
- * the inode is already a part of the transaction.
+ * This is called to return an inode to the inode free list.  The inode should
+ * already be truncated to 0 length and have no pages associated with it.  This
+ * routine also assumes that the inode is already a part of the transaction.
  *
- * The on-disk copy of the inode will have been added to the list
- * of unlinked inodes in the AGI. We need to remove the inode from
- * that list atomically with respect to freeing it here.
+ * The on-disk copy of the inode will have been added to the list of unlinked
+ * inodes in the AGI. We need to remove the inode from that list atomically with
+ * respect to freeing it here.
  */
 int
 xfs_ifree(
@@ -2308,13 +2307,16 @@  xfs_ifree(
 	ASSERT(ip->i_d.di_nblocks == 0);
 
 	/*
-	 * Pull the on-disk inode from the AGI unlinked list.
+	 * Free the inode first so that we guarantee that the AGI lock is going
+	 * to be taken before we remove the inode from the unlinked list. This
+	 * makes the AGI lock -> unlinked list modification order the same as
+	 * used in O_TMPFILE creation.
 	 */
-	error = xfs_iunlink_remove(tp, ip);
+	error = xfs_difree(tp, ip->i_ino, &xic);
 	if (error)
 		return error;
 
-	error = xfs_difree(tp, ip->i_ino, &xic);
+	error = xfs_iunlink_remove(tp, ip);
 	if (error)
 		return error;