diff mbox series

[v14,04/15] xfs: Add delay ready attr remove routines

Message ID 20201218072917.16805-5-allison.henderson@oracle.com (mailing list archive)
State Superseded
Headers show
Series xfs: Delayed Attributes | expand

Commit Message

Allison Henderson Dec. 18, 2020, 7:29 a.m. UTC
This patch modifies the attr remove routines to be delay ready. This
means they no longer roll or commit transactions, but instead return
-EAGAIN to have the calling routine roll and refresh the transaction. In
this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
uses a sort of state machine like switch to keep track of where it was
when EAGAIN was returned. xfs_attr_node_removename has also been
modified to use the switch, and a new version of xfs_attr_remove_args
consists of a simple loop to refresh the transaction until the operation
is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
transaction where ever the existing code used to.

Calls to xfs_attr_rmtval_remove are replaced with the delay ready
version __xfs_attr_rmtval_remove. We will rename
__xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
done.

xfs_attr_rmtval_remove itself is still in use by the set routines (used
during a rename).  For reasons of preserving existing function, we
modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
set.  Similar to how xfs_attr_remove_args does here.  Once we transition
the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
used and will be removed.

This patch also adds a new struct xfs_delattr_context, which we will use
to keep track of the current state of an attribute operation. The new
xfs_delattr_state enum is used to track various operations that are in
progress so that we know not to repeat them, and resume where we left
off before EAGAIN was returned to cycle out the transaction. Other
members take the place of local variables that need to retain their
values across multiple function recalls.  See xfs_attr.h for a more
detailed diagram of the states.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 218 +++++++++++++++++++++++++++++-----------
 fs/xfs/libxfs/xfs_attr.h        | 100 ++++++++++++++++++
 fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
 fs/xfs/libxfs/xfs_attr_remote.c |  48 +++++----
 fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
 fs/xfs/xfs_attr_inactive.c      |   2 +-
 6 files changed, 288 insertions(+), 84 deletions(-)

Comments

Chandan Babu R Dec. 22, 2020, 7:22 a.m. UTC | #1
On Fri, 18 Dec 2020 00:29:06 -0700, Allison Henderson wrote:
> This patch modifies the attr remove routines to be delay ready. This
> means they no longer roll or commit transactions, but instead return
> -EAGAIN to have the calling routine roll and refresh the transaction. In
> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> uses a sort of state machine like switch to keep track of where it was
> when EAGAIN was returned. xfs_attr_node_removename has also been
> modified to use the switch, and a new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation
> is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> transaction where ever the existing code used to.
> 
> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> version __xfs_attr_rmtval_remove. We will rename
> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> done.
> 
> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> during a rename).  For reasons of preserving existing function, we
> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> used and will be removed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use
> to keep track of the current state of an attribute operation. The new
> xfs_delattr_state enum is used to track various operations that are in
> progress so that we know not to repeat them, and resume where we left
> off before EAGAIN was returned to cycle out the transaction. Other
> members take the place of local variables that need to retain their
> values across multiple function recalls.  See xfs_attr.h for a more
> detailed diagram of the states.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 218 +++++++++++++++++++++++++++++-----------
>  fs/xfs/libxfs/xfs_attr.h        | 100 ++++++++++++++++++
>  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>  fs/xfs/libxfs/xfs_attr_remote.c |  48 +++++----
>  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>  fs/xfs/xfs_attr_inactive.c      |   2 +-
>  6 files changed, 288 insertions(+), 84 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 1969b88..b6330f9 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>   */
>  STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>  STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> @@ -264,6 +264,34 @@ xfs_attr_set_shortform(
>  }
>  
>  /*
> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> + * also checks for a defer finish.  Transaction is finished and rolled as
> + * needed, and returns true of false if the delayed operation should continue.
> + */
> +int
> +xfs_attr_trans_roll(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error;
> +
> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> +		/*
> +		 * The caller wants us to finish all the deferred ops so that we
> +		 * avoid pinning the log tail with a large number of deferred
> +		 * ops.
> +		 */
> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> +		error = xfs_defer_finish(&args->trans);
> +		if (error)
> +			return error;
> +	} else
> +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +
> +	return error;
> +}
> +
> +/*
>   * Set the attribute specified in @args.
>   */
>  int
> @@ -364,23 +392,58 @@ xfs_has_attr(
>   */
>  int
>  xfs_attr_remove_args(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args	*args)
>  {
> -	struct xfs_inode	*dp = args->dp;
> -	int			error;
> +	int				error;
> +	struct xfs_delattr_context	dac = {
> +		.da_args	= args,
> +	};
> +
> +	do {
> +		error = xfs_attr_remove_iter(&dac);
> +		if (error != -EAGAIN)
> +			break;
> +
> +		error = xfs_attr_trans_roll(&dac);
> +		if (error)
> +			return error;
> +
> +	} while (true);
> +
> +	return error;
> +}
>  
> -	if (!xfs_inode_hasattr(dp)) {
> -		error = -ENOATTR;
> -	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> +/*
> + * Remove the attribute specified in @args.
> + *
> + * This function may return -EAGAIN to signal that the transaction needs to be
> + * rolled.  Callers should continue calling this function until they receive a
> + * return value other than -EAGAIN.
> + */
> +int
> +xfs_attr_remove_iter(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_inode		*dp = args->dp;
> +
> +	/* If we are shrinking a node, resume shrink */
> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> +		goto node;
> +
> +	if (!xfs_inode_hasattr(dp))
> +		return -ENOATTR;
> +
> +	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> -		error = xfs_attr_shortform_remove(args);
> -	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> -		error = xfs_attr_leaf_removename(args);
> -	} else {
> -		error = xfs_attr_node_removename(args);
> +		return xfs_attr_shortform_remove(args);
>  	}
>  
> -	return error;
> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> +		return xfs_attr_leaf_removename(args);
> +node:
> +	/* If we are not short form or leaf, then proceed to remove node */
> +	return  xfs_attr_node_removename_iter(dac);
>  }
>  
>  /*
> @@ -1178,10 +1241,11 @@ xfs_attr_leaf_mark_incomplete(
>   */
>  STATIC
>  int xfs_attr_node_removename_setup(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	**state)
> +	struct xfs_delattr_context	*dac)
>  {

In xfs_attr_node_removename_setup(), if either of
xfs_attr_leaf_mark_incomplete() or xfs_attr_rmtval_invalidate() returns with a
non-zero value, the memory pointed to by dac->da_state is not freed. This
happens because the caller (i.e. xfs_attr_node_removename_iter()) checks for
the non-NULL value of its local variable "state" to actually free the
corresponding memory.

> -	int			error;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		**state = &dac->da_state;
> +	int				error;
>  
>  	error = xfs_attr_node_hasname(args, state);
>  	if (error != -EEXIST)
> @@ -1203,13 +1267,16 @@ int xfs_attr_node_removename_setup(
>  }
>  
>  STATIC int
> -xfs_attr_node_remove_rmt(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +xfs_attr_node_remove_rmt (
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_da_state		*state)
>  {
> -	int			error = 0;
> +	int				error = 0;
>  
> -	error = xfs_attr_rmtval_remove(args);
> +	/*
> +	 * May return -EAGAIN to request that the caller recall this function
> +	 */
> +	error = __xfs_attr_rmtval_remove(dac);
>  	if (error)
>  		return error;
>  
> @@ -1240,28 +1307,34 @@ xfs_attr_node_remove_cleanup(
>  }
>  
>  /*
> - * Remove a name from a B-tree attribute list.
> + * Step through removeing a name from a B-tree attribute list.
>   *
>   * This will involve walking down the Btree, and may involve joining
>   * leaf nodes and even joining intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
>  xfs_attr_node_remove_step(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +	struct xfs_delattr_context	*dac)
>  {
> -	int			error;
> -	struct xfs_inode	*dp = args->dp;
> -
> -
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state = dac->da_state;
> +	int				error = 0;
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
>  	 * This is done before we remove the attribute so that we don't
>  	 * overflow the maximum size of a transaction and/or hit a deadlock.
>  	 */
>  	if (args->rmtblkno > 0) {
> -		error = xfs_attr_node_remove_rmt(args, state);
> +		/*
> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> +		 */
> +		error = xfs_attr_node_remove_rmt(dac, state);
>  		if (error)
>  			return error;
>  	}
> @@ -1274,51 +1347,74 @@ xfs_attr_node_remove_step(
>   *
>   * This routine will find the blocks of the name to remove, remove them and
>   * shrink the tree if needed.
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
> -xfs_attr_node_removename(
> -	struct xfs_da_args	*args)
> +xfs_attr_node_removename_iter(
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state	*state = NULL;
> -	int			retval, error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state = NULL;
> +	int				retval, error;
> +	struct xfs_inode		*dp = args->dp;
>  
>  	trace_xfs_attr_node_removename(args);
>  
> -	error = xfs_attr_node_removename_setup(args, &state);
> -	if (error)
> -		goto out;
> +	if (!dac->da_state) {
> +		error = xfs_attr_node_removename_setup(dac);
> +		if (error)
> +			goto out;
> +	}
> +	state = dac->da_state;
>  
> -	error = xfs_attr_node_remove_step(args, state);
> -	if (error)
> -		goto out;
> +	switch (dac->dela_state) {
> +	case XFS_DAS_UNINIT:
> +		/*
> +		 * repeatedly remove remote blocks, remove the entry and join.
> +		 * returns -EAGAIN or 0 for completion of the step.
> +		 */
> +		error = xfs_attr_node_remove_step(dac);
> +		if (error)
> +			break;
>  
> -	retval = xfs_attr_node_remove_cleanup(args, state);
> +		retval = xfs_attr_node_remove_cleanup(args, state);
>  
> -	/*
> -	 * Check to see if the tree needs to be collapsed.
> -	 */
> -	if (retval && (state->path.active > 1)) {
> -		error = xfs_da3_join(state);
> -		if (error)
> -			return error;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			return error;
>  		/*
> -		 * Commit the Btree join operation and start a new trans.
> +		 * Check to see if the tree needs to be collapsed. Set the flag
> +		 * to indicate that the calling function needs to move the
> +		 * shrink operation
>  		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			return error;
> -	}
> +		if (retval && (state->path.active > 1)) {
> +			error = xfs_da3_join(state);
> +			if (error)
> +				return error;
>  
> -	/*
> -	 * If the result is small enough, push it all into the inode.
> -	 */
> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> -		error = xfs_attr_node_shrink(args, state);
> +			dac->flags |= XFS_DAC_DEFER_FINISH;
> +			dac->dela_state = XFS_DAS_RM_SHRINK;
> +			return -EAGAIN;
> +		}
> +
> +		/* fallthrough */
> +	case XFS_DAS_RM_SHRINK:
> +		/*
> +		 * If the result is small enough, push it all into the inode.
> +		 */
> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> +			error = xfs_attr_node_shrink(args, state);
> +
> +		break;
> +	default:
> +		ASSERT(0);
> +		error = -EINVAL;
> +		goto out;
> +	}
>  
> +	if (error == -EAGAIN)
> +		return error;
>  out:
>  	if (state)
>  		xfs_da_state_free(state);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 3e97a93..3154ef4 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -74,6 +74,102 @@ struct xfs_attr_list_context {
>  };
>  
>  
> +/*
> + * ========================================================================
> + * Structure used to pass context around among the delayed routines.
> + * ========================================================================
> + */
> +
> +/*
> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> + * states indicate places where the function would return -EAGAIN, and then
> + * immediately resume from after being recalled by the calling function. States
> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> + * so the calling function needs to pass them back to that subroutine to allow
> + * it to finish where it left off. But they otherwise do not have a role in the
> + * calling function other than just passing through.
> + *
> + * xfs_attr_remove_iter()
> + *              │
> + *              v
> + *        found attr blks? ───n──┐
> + *              │                v
> + *              │         find and invalidate
> + *              y         the blocks. mark
> + *              │         attr incomplete
> + *              ├────────────────┘
> + *              │
> + *              v
> + *      remove a block with
> + *    xfs_attr_node_remove_step <────┐
> + *              │                    │
> + *              v                    │
> + *      still have blks ──y──> return -EAGAIN.
> + *        to remove?          re-enter with one
> + *              │            less blk to remove
> + *              n
> + *              │
> + *              v
> + *       remove leaf and
> + *       update hash with
> + *   xfs_attr_node_remove_cleanup
> + *              │
> + *              v
> + *           need to
> + *        shrink tree? ─n─┐
> + *              │         │
> + *              y         │
> + *              │         │
> + *              v         │
> + *          join leaf     │
> + *              │         │
> + *              v         │
> + *      XFS_DAS_RM_SHRINK │
> + *              │         │
> + *              v         │
> + *       do the shrink    │
> + *              │         │
> + *              v         │
> + *          free state <──┘
> + *              │
> + *              v
> + *            done
> + *
> + */
> +
> +/*
> + * Enum values for xfs_delattr_context.da_state
> + *
> + * These values are used by delayed attribute operations to keep track  of where
> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> + * calling function to roll the transaction, and then recall the subroutine to
> + * finish the operation.  The enum is then used by the subroutine to jump back
> + * to where it was and resume executing where it left off.
> + */
> +enum xfs_delattr_state {
> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> +};
> +
> +/*
> + * Defines for xfs_delattr_context.flags
> + */
> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> +
> +/*
> + * Context used for keeping track of delayed attribute operations
> + */
> +struct xfs_delattr_context {
> +	struct xfs_da_args      *da_args;
> +
> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> +	struct xfs_da_state     *da_state;
> +
> +	/* Used to keep track of current state of delayed operation */
> +	unsigned int            flags;
> +	enum xfs_delattr_state  dela_state;
> +};
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -91,6 +187,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> +			      struct xfs_da_args *args);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index d6ef69a..3780141 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -19,8 +19,8 @@
>  #include "xfs_bmap_btree.h"
>  #include "xfs_bmap.h"
>  #include "xfs_attr_sf.h"
> -#include "xfs_attr_remote.h"
>  #include "xfs_attr.h"
> +#include "xfs_attr_remote.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 48d8e9c..f09820c 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>   */
>  int
>  xfs_attr_rmtval_remove(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args		*args)
>  {
> -	int			error;
> -	int			retval;
> +	int				error;
> +	struct xfs_delattr_context	dac  = {
> +		.da_args	= args,
> +	};
>  
>  	trace_xfs_attr_rmtval_remove(args);
>  
> @@ -685,31 +687,29 @@ xfs_attr_rmtval_remove(
>  	 * Keep de-allocating extents until the remote-value region is gone.
>  	 */
>  	do {
> -		retval = __xfs_attr_rmtval_remove(args);
> -		if (retval && retval != -EAGAIN)
> -			return retval;
> +		error = __xfs_attr_rmtval_remove(&dac);
> +		if (error != -EAGAIN)
> +			break;
>  
> -		/*
> -		 * Close out trans and start the next one in the chain.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +		error = xfs_attr_trans_roll(&dac);
>  		if (error)
>  			return error;
> -	} while (retval == -EAGAIN);
> +	} while (true);
>  
> -	return 0;
> +	return error;
>  }
>  
>  /*
>   * Remove the value associated with an attribute by deleting the out-of-line
> - * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
> + * buffer that it is stored on. Returns -EAGAIN for the caller to refresh the
>   * transaction and re-call the function
>   */
>  int
>  __xfs_attr_rmtval_remove(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	int			error, done;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error, done;
>  
>  	/*
>  	 * Unmap value blocks for this attr.
> @@ -719,12 +719,20 @@ __xfs_attr_rmtval_remove(
>  	if (error)
>  		return error;
>  
> -	error = xfs_defer_finish(&args->trans);
> -	if (error)
> -		return error;
> -
> -	if (!done)
> +	/*
> +	 * We dont need an explicit state here to pick up where we left off.  We
> +	 * can figure it out using the !done return code.  Calling function only
> +	 * needs to keep recalling this routine until we indicate to stop by
> +	 * returning anything other than -EAGAIN. The actual value of
> +	 * attr->xattri_dela_state may be some value reminicent of the calling
> +	 * function, but it's value is irrelevant with in the context of this
> +	 * function.  Once we are done here, the next state is set as needed
> +	 * by the parent
> +	 */
> +	if (!done) {
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>  		return -EAGAIN;
> +	}
>  
>  	return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> index 9eee615..002fd30 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>  		xfs_buf_flags_t incore_flags);
>  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>  #endif /* __XFS_ATTR_REMOTE_H__ */
> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> index bfad669..aaa7e66 100644
> --- a/fs/xfs/xfs_attr_inactive.c
> +++ b/fs/xfs/xfs_attr_inactive.c
> @@ -15,10 +15,10 @@
>  #include "xfs_da_format.h"
>  #include "xfs_da_btree.h"
>  #include "xfs_inode.h"
> +#include "xfs_attr.h"
>  #include "xfs_attr_remote.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> -#include "xfs_attr.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_quota.h"
>  #include "xfs_dir2.h"
>
Allison Henderson Dec. 22, 2020, 3:41 p.m. UTC | #2
On 12/22/20 12:22 AM, Chandan Babu R wrote:
> On Fri, 18 Dec 2020 00:29:06 -0700, Allison Henderson wrote:
>> This patch modifies the attr remove routines to be delay ready. This
>> means they no longer roll or commit transactions, but instead return
>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>> uses a sort of state machine like switch to keep track of where it was
>> when EAGAIN was returned. xfs_attr_node_removename has also been
>> modified to use the switch, and a new version of xfs_attr_remove_args
>> consists of a simple loop to refresh the transaction until the operation
>> is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
>> transaction where ever the existing code used to.
>>
>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>> version __xfs_attr_rmtval_remove. We will rename
>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>> done.
>>
>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>> during a rename).  For reasons of preserving existing function, we
>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>> used and will be removed.
>>
>> This patch also adds a new struct xfs_delattr_context, which we will use
>> to keep track of the current state of an attribute operation. The new
>> xfs_delattr_state enum is used to track various operations that are in
>> progress so that we know not to repeat them, and resume where we left
>> off before EAGAIN was returned to cycle out the transaction. Other
>> members take the place of local variables that need to retain their
>> values across multiple function recalls.  See xfs_attr.h for a more
>> detailed diagram of the states.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c        | 218 +++++++++++++++++++++++++++++-----------
>>   fs/xfs/libxfs/xfs_attr.h        | 100 ++++++++++++++++++
>>   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>   fs/xfs/libxfs/xfs_attr_remote.c |  48 +++++----
>>   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>   fs/xfs/xfs_attr_inactive.c      |   2 +-
>>   6 files changed, 288 insertions(+), 84 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 1969b88..b6330f9 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>    */
>>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>   STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
>> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   				 struct xfs_da_state **state);
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>> @@ -264,6 +264,34 @@ xfs_attr_set_shortform(
>>   }
>>   
>>   /*
>> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
>> + * also checks for a defer finish.  Transaction is finished and rolled as
>> + * needed, and returns true of false if the delayed operation should continue.
>> + */
>> +int
>> +xfs_attr_trans_roll(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error;
>> +
>> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>> +		/*
>> +		 * The caller wants us to finish all the deferred ops so that we
>> +		 * avoid pinning the log tail with a large number of deferred
>> +		 * ops.
>> +		 */
>> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>> +		error = xfs_defer_finish(&args->trans);
>> +		if (error)
>> +			return error;
>> +	} else
>> +		error = xfs_trans_roll_inode(&args->trans, args->dp);
>> +
>> +	return error;
>> +}
>> +
>> +/*
>>    * Set the attribute specified in @args.
>>    */
>>   int
>> @@ -364,23 +392,58 @@ xfs_has_attr(
>>    */
>>   int
>>   xfs_attr_remove_args(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args	*args)
>>   {
>> -	struct xfs_inode	*dp = args->dp;
>> -	int			error;
>> +	int				error;
>> +	struct xfs_delattr_context	dac = {
>> +		.da_args	= args,
>> +	};
>> +
>> +	do {
>> +		error = xfs_attr_remove_iter(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>> +
>> +		error = xfs_attr_trans_roll(&dac);
>> +		if (error)
>> +			return error;
>> +
>> +	} while (true);
>> +
>> +	return error;
>> +}
>>   
>> -	if (!xfs_inode_hasattr(dp)) {
>> -		error = -ENOATTR;
>> -	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>> +/*
>> + * Remove the attribute specified in @args.
>> + *
>> + * This function may return -EAGAIN to signal that the transaction needs to be
>> + * rolled.  Callers should continue calling this function until they receive a
>> + * return value other than -EAGAIN.
>> + */
>> +int
>> +xfs_attr_remove_iter(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_inode		*dp = args->dp;
>> +
>> +	/* If we are shrinking a node, resume shrink */
>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>> +		goto node;
>> +
>> +	if (!xfs_inode_hasattr(dp))
>> +		return -ENOATTR;
>> +
>> +	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>>   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>> -		error = xfs_attr_shortform_remove(args);
>> -	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>> -		error = xfs_attr_leaf_removename(args);
>> -	} else {
>> -		error = xfs_attr_node_removename(args);
>> +		return xfs_attr_shortform_remove(args);
>>   	}
>>   
>> -	return error;
>> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> +		return xfs_attr_leaf_removename(args);
>> +node:
>> +	/* If we are not short form or leaf, then proceed to remove node */
>> +	return  xfs_attr_node_removename_iter(dac);
>>   }
>>   
>>   /*
>> @@ -1178,10 +1241,11 @@ xfs_attr_leaf_mark_incomplete(
>>    */
>>   STATIC
>>   int xfs_attr_node_removename_setup(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	**state)
>> +	struct xfs_delattr_context	*dac)
>>   {
> 
> In xfs_attr_node_removename_setup(), if either of
> xfs_attr_leaf_mark_incomplete() or xfs_attr_rmtval_invalidate() returns with a
> non-zero value, the memory pointed to by dac->da_state is not freed. This
> happens because the caller (i.e. xfs_attr_node_removename_iter()) checks for
> the non-NULL value of its local variable "state" to actually free the
> corresponding memory.
> 
Ok, for this one it think it makes more sense to put an extra free in 
the helper rather than have the caller handle it.  Will fix.

Do you have a tool thats tracing this out, or is it just by hand? 
Because if it's a tool, I should probably be using it too :-)

Thanks!
Allison


>> -	int			error;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		**state = &dac->da_state;
>> +	int				error;
>>   
>>   	error = xfs_attr_node_hasname(args, state);
>>   	if (error != -EEXIST)
>> @@ -1203,13 +1267,16 @@ int xfs_attr_node_removename_setup(
>>   }
>>   
>>   STATIC int
>> -xfs_attr_node_remove_rmt(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +xfs_attr_node_remove_rmt (
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_da_state		*state)
>>   {
>> -	int			error = 0;
>> +	int				error = 0;
>>   
>> -	error = xfs_attr_rmtval_remove(args);
>> +	/*
>> +	 * May return -EAGAIN to request that the caller recall this function
>> +	 */
>> +	error = __xfs_attr_rmtval_remove(dac);
>>   	if (error)
>>   		return error;
>>   
>> @@ -1240,28 +1307,34 @@ xfs_attr_node_remove_cleanup(
>>   }
>>   
>>   /*
>> - * Remove a name from a B-tree attribute list.
>> + * Step through removeing a name from a B-tree attribute list.
>>    *
>>    * This will involve walking down the Btree, and may involve joining
>>    * leaf nodes and even joining intermediate nodes up to and including
>>    * the root node (a special case of an intermediate node).
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>>   xfs_attr_node_remove_step(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	int			error;
>> -	struct xfs_inode	*dp = args->dp;
>> -
>> -
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state = dac->da_state;
>> +	int				error = 0;
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.
>>   	 * This is done before we remove the attribute so that we don't
>>   	 * overflow the maximum size of a transaction and/or hit a deadlock.
>>   	 */
>>   	if (args->rmtblkno > 0) {
>> -		error = xfs_attr_node_remove_rmt(args, state);
>> +		/*
>> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
>> +		 */
>> +		error = xfs_attr_node_remove_rmt(dac, state);
>>   		if (error)
>>   			return error;
>>   	}
>> @@ -1274,51 +1347,74 @@ xfs_attr_node_remove_step(
>>    *
>>    * This routine will find the blocks of the name to remove, remove them and
>>    * shrink the tree if needed.
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>> -xfs_attr_node_removename(
>> -	struct xfs_da_args	*args)
>> +xfs_attr_node_removename_iter(
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state	*state = NULL;
>> -	int			retval, error;
>> -	struct xfs_inode	*dp = args->dp;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state = NULL;
>> +	int				retval, error;
>> +	struct xfs_inode		*dp = args->dp;
>>   
>>   	trace_xfs_attr_node_removename(args);
>>   
>> -	error = xfs_attr_node_removename_setup(args, &state);
>> -	if (error)
>> -		goto out;
>> +	if (!dac->da_state) {
>> +		error = xfs_attr_node_removename_setup(dac);
>> +		if (error)
>> +			goto out;
>> +	}
>> +	state = dac->da_state;
>>   
>> -	error = xfs_attr_node_remove_step(args, state);
>> -	if (error)
>> -		goto out;
>> +	switch (dac->dela_state) {
>> +	case XFS_DAS_UNINIT:
>> +		/*
>> +		 * repeatedly remove remote blocks, remove the entry and join.
>> +		 * returns -EAGAIN or 0 for completion of the step.
>> +		 */
>> +		error = xfs_attr_node_remove_step(dac);
>> +		if (error)
>> +			break;
>>   
>> -	retval = xfs_attr_node_remove_cleanup(args, state);
>> +		retval = xfs_attr_node_remove_cleanup(args, state);
>>   
>> -	/*
>> -	 * Check to see if the tree needs to be collapsed.
>> -	 */
>> -	if (retval && (state->path.active > 1)) {
>> -		error = xfs_da3_join(state);
>> -		if (error)
>> -			return error;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			return error;
>>   		/*
>> -		 * Commit the Btree join operation and start a new trans.
>> +		 * Check to see if the tree needs to be collapsed. Set the flag
>> +		 * to indicate that the calling function needs to move the
>> +		 * shrink operation
>>   		 */
>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>> -		if (error)
>> -			return error;
>> -	}
>> +		if (retval && (state->path.active > 1)) {
>> +			error = xfs_da3_join(state);
>> +			if (error)
>> +				return error;
>>   
>> -	/*
>> -	 * If the result is small enough, push it all into the inode.
>> -	 */
>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> -		error = xfs_attr_node_shrink(args, state);
>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>> +			dac->dela_state = XFS_DAS_RM_SHRINK;
>> +			return -EAGAIN;
>> +		}
>> +
>> +		/* fallthrough */
>> +	case XFS_DAS_RM_SHRINK:
>> +		/*
>> +		 * If the result is small enough, push it all into the inode.
>> +		 */
>> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> +			error = xfs_attr_node_shrink(args, state);
>> +
>> +		break;
>> +	default:
>> +		ASSERT(0);
>> +		error = -EINVAL;
>> +		goto out;
>> +	}
>>   
>> +	if (error == -EAGAIN)
>> +		return error;
>>   out:
>>   	if (state)
>>   		xfs_da_state_free(state);
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 3e97a93..3154ef4 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -74,6 +74,102 @@ struct xfs_attr_list_context {
>>   };
>>   
>>   
>> +/*
>> + * ========================================================================
>> + * Structure used to pass context around among the delayed routines.
>> + * ========================================================================
>> + */
>> +
>> +/*
>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>> + * states indicate places where the function would return -EAGAIN, and then
>> + * immediately resume from after being recalled by the calling function. States
>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>> + * so the calling function needs to pass them back to that subroutine to allow
>> + * it to finish where it left off. But they otherwise do not have a role in the
>> + * calling function other than just passing through.
>> + *
>> + * xfs_attr_remove_iter()
>> + *              │
>> + *              v
>> + *        found attr blks? ───n──┐
>> + *              │                v
>> + *              │         find and invalidate
>> + *              y         the blocks. mark
>> + *              │         attr incomplete
>> + *              ├────────────────┘
>> + *              │
>> + *              v
>> + *      remove a block with
>> + *    xfs_attr_node_remove_step <────┐
>> + *              │                    │
>> + *              v                    │
>> + *      still have blks ──y──> return -EAGAIN.
>> + *        to remove?          re-enter with one
>> + *              │            less blk to remove
>> + *              n
>> + *              │
>> + *              v
>> + *       remove leaf and
>> + *       update hash with
>> + *   xfs_attr_node_remove_cleanup
>> + *              │
>> + *              v
>> + *           need to
>> + *        shrink tree? ─n─┐
>> + *              │         │
>> + *              y         │
>> + *              │         │
>> + *              v         │
>> + *          join leaf     │
>> + *              │         │
>> + *              v         │
>> + *      XFS_DAS_RM_SHRINK │
>> + *              │         │
>> + *              v         │
>> + *       do the shrink    │
>> + *              │         │
>> + *              v         │
>> + *          free state <──┘
>> + *              │
>> + *              v
>> + *            done
>> + *
>> + */
>> +
>> +/*
>> + * Enum values for xfs_delattr_context.da_state
>> + *
>> + * These values are used by delayed attribute operations to keep track  of where
>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>> + * calling function to roll the transaction, and then recall the subroutine to
>> + * finish the operation.  The enum is then used by the subroutine to jump back
>> + * to where it was and resume executing where it left off.
>> + */
>> +enum xfs_delattr_state {
>> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>> +};
>> +
>> +/*
>> + * Defines for xfs_delattr_context.flags
>> + */
>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>> +
>> +/*
>> + * Context used for keeping track of delayed attribute operations
>> + */
>> +struct xfs_delattr_context {
>> +	struct xfs_da_args      *da_args;
>> +
>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>> +	struct xfs_da_state     *da_state;
>> +
>> +	/* Used to keep track of current state of delayed operation */
>> +	unsigned int            flags;
>> +	enum xfs_delattr_state  dela_state;
>> +};
>> +
>>   /*========================================================================
>>    * Function prototypes for the kernel.
>>    *========================================================================*/
>> @@ -91,6 +187,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>   int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>> +			      struct xfs_da_args *args);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>> index d6ef69a..3780141 100644
>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>> @@ -19,8 +19,8 @@
>>   #include "xfs_bmap_btree.h"
>>   #include "xfs_bmap.h"
>>   #include "xfs_attr_sf.h"
>> -#include "xfs_attr_remote.h"
>>   #include "xfs_attr.h"
>> +#include "xfs_attr_remote.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_error.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>> index 48d8e9c..f09820c 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>>    */
>>   int
>>   xfs_attr_rmtval_remove(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args		*args)
>>   {
>> -	int			error;
>> -	int			retval;
>> +	int				error;
>> +	struct xfs_delattr_context	dac  = {
>> +		.da_args	= args,
>> +	};
>>   
>>   	trace_xfs_attr_rmtval_remove(args);
>>   
>> @@ -685,31 +687,29 @@ xfs_attr_rmtval_remove(
>>   	 * Keep de-allocating extents until the remote-value region is gone.
>>   	 */
>>   	do {
>> -		retval = __xfs_attr_rmtval_remove(args);
>> -		if (retval && retval != -EAGAIN)
>> -			return retval;
>> +		error = __xfs_attr_rmtval_remove(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>>   
>> -		/*
>> -		 * Close out trans and start the next one in the chain.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		error = xfs_attr_trans_roll(&dac);
>>   		if (error)
>>   			return error;
>> -	} while (retval == -EAGAIN);
>> +	} while (true);
>>   
>> -	return 0;
>> +	return error;
>>   }
>>   
>>   /*
>>    * Remove the value associated with an attribute by deleting the out-of-line
>> - * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
>> + * buffer that it is stored on. Returns -EAGAIN for the caller to refresh the
>>    * transaction and re-call the function
>>    */
>>   int
>>   __xfs_attr_rmtval_remove(
>> -	struct xfs_da_args	*args)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	int			error, done;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error, done;
>>   
>>   	/*
>>   	 * Unmap value blocks for this attr.
>> @@ -719,12 +719,20 @@ __xfs_attr_rmtval_remove(
>>   	if (error)
>>   		return error;
>>   
>> -	error = xfs_defer_finish(&args->trans);
>> -	if (error)
>> -		return error;
>> -
>> -	if (!done)
>> +	/*
>> +	 * We dont need an explicit state here to pick up where we left off.  We
>> +	 * can figure it out using the !done return code.  Calling function only
>> +	 * needs to keep recalling this routine until we indicate to stop by
>> +	 * returning anything other than -EAGAIN. The actual value of
>> +	 * attr->xattri_dela_state may be some value reminicent of the calling
>> +	 * function, but it's value is irrelevant with in the context of this
>> +	 * function.  Once we are done here, the next state is set as needed
>> +	 * by the parent
>> +	 */
>> +	if (!done) {
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   		return -EAGAIN;
>> +	}
>>   
>>   	return error;
>>   }
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>> index 9eee615..002fd30 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>   		xfs_buf_flags_t incore_flags);
>>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>   #endif /* __XFS_ATTR_REMOTE_H__ */
>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>> index bfad669..aaa7e66 100644
>> --- a/fs/xfs/xfs_attr_inactive.c
>> +++ b/fs/xfs/xfs_attr_inactive.c
>> @@ -15,10 +15,10 @@
>>   #include "xfs_da_format.h"
>>   #include "xfs_da_btree.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_attr.h"
>>   #include "xfs_attr_remote.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> -#include "xfs_attr.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_quota.h"
>>   #include "xfs_dir2.h"
>>
> 
>
Brian Foster Dec. 22, 2020, 5:11 p.m. UTC | #3
On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
> This patch modifies the attr remove routines to be delay ready. This
> means they no longer roll or commit transactions, but instead return
> -EAGAIN to have the calling routine roll and refresh the transaction. In
> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> uses a sort of state machine like switch to keep track of where it was
> when EAGAIN was returned. xfs_attr_node_removename has also been
> modified to use the switch, and a new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation
> is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> transaction where ever the existing code used to.
> 
> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> version __xfs_attr_rmtval_remove. We will rename
> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> done.
> 
> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> during a rename).  For reasons of preserving existing function, we
> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> used and will be removed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use
> to keep track of the current state of an attribute operation. The new
> xfs_delattr_state enum is used to track various operations that are in
> progress so that we know not to repeat them, and resume where we left
> off before EAGAIN was returned to cycle out the transaction. Other
> members take the place of local variables that need to retain their
> values across multiple function recalls.  See xfs_attr.h for a more
> detailed diagram of the states.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---

I started with a couple small comments on this patch but inevitably
started thinking more about the factoring again and ended up with a
couple patches on top. The first is more of some small tweaks and
open-coding that IMO makes this patch a bit easier to follow. The
second is more of an RFC so I'll follow up with that in a second email.
I'm curious what folks' thoughts might be on either. Also note that I'm
primarily focusing on code structure and whatnot here, so these are fast
and loose, compile tested only and likely to be broken.

First diff:

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b6330f953f40..2e466c4ac283 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -58,6 +58,9 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
+STATIC
+int xfs_attr_node_removename_setup(
+	struct xfs_delattr_context	*dac);
 
 int
 xfs_inode_hasattr(
@@ -395,12 +398,34 @@ xfs_attr_remove_args(
 	struct xfs_da_args	*args)
 {
 	int				error;
+	struct xfs_inode		*dp = args->dp;
 	struct xfs_delattr_context	dac = {
+		.dela_state	= XFS_DAS_UNINIT,
 		.da_args	= args,
 	};
 
+	if (!xfs_inode_hasattr(dp))
+		return -ENOATTR;
+
+	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
+		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
+		return xfs_attr_shortform_remove(args);
+	}
+
+	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
+		return xfs_attr_leaf_removename(args);
+
+	/* node format requires multiple transactions... */
+
+	trace_xfs_attr_node_removename(args);
+	if (!dac.da_state) {
+		error = xfs_attr_node_removename_setup(&dac);
+		if (error)
+			return error;
+	}
+
 	do {
-		error = xfs_attr_remove_iter(&dac);
+		error = xfs_attr_node_removename_iter(&dac);
 		if (error != -EAGAIN)
 			break;
 
@@ -413,39 +438,6 @@ xfs_attr_remove_args(
 	return error;
 }
 
-/*
- * Remove the attribute specified in @args.
- *
- * This function may return -EAGAIN to signal that the transaction needs to be
- * rolled.  Callers should continue calling this function until they receive a
- * return value other than -EAGAIN.
- */
-int
-xfs_attr_remove_iter(
-	struct xfs_delattr_context	*dac)
-{
-	struct xfs_da_args		*args = dac->da_args;
-	struct xfs_inode		*dp = args->dp;
-
-	/* If we are shrinking a node, resume shrink */
-	if (dac->dela_state == XFS_DAS_RM_SHRINK)
-		goto node;
-
-	if (!xfs_inode_hasattr(dp))
-		return -ENOATTR;
-
-	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
-		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
-		return xfs_attr_shortform_remove(args);
-	}
-
-	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
-		return xfs_attr_leaf_removename(args);
-node:
-	/* If we are not short form or leaf, then proceed to remove node */
-	return  xfs_attr_node_removename_iter(dac);
-}
-
 /*
  * Note: If args->value is NULL the attribute will be removed, just like the
  * Linux ->setattr API.
@@ -1266,46 +1258,6 @@ int xfs_attr_node_removename_setup(
 	return 0;
 }
 
-STATIC int
-xfs_attr_node_remove_rmt (
-	struct xfs_delattr_context	*dac,
-	struct xfs_da_state		*state)
-{
-	int				error = 0;
-
-	/*
-	 * May return -EAGAIN to request that the caller recall this function
-	 */
-	error = __xfs_attr_rmtval_remove(dac);
-	if (error)
-		return error;
-
-	/*
-	 * Refill the state structure with buffers, the prior calls released our
-	 * buffers.
-	 */
-	return xfs_attr_refillstate(state);
-}
-
-STATIC int
-xfs_attr_node_remove_cleanup(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
-{
-	struct xfs_da_state_blk	*blk;
-	int			retval;
-
-	/*
-	 * Remove the name and update the hashvals in the tree.
-	 */
-	blk = &state->path.blk[state->path.active-1];
-	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
-	retval = xfs_attr3_leaf_remove(blk->bp, args);
-	xfs_da3_fixhashpath(state, &state->path);
-
-	return retval;
-}
-
 /*
  * Step through removeing a name from a B-tree attribute list.
  *
@@ -1320,25 +1272,54 @@ xfs_attr_node_remove_cleanup(
  */
 STATIC int
 xfs_attr_node_remove_step(
-	struct xfs_delattr_context	*dac)
+	struct xfs_delattr_context	*dac,
+	bool				*joined)
 {
 	struct xfs_da_args		*args = dac->da_args;
 	struct xfs_da_state		*state = dac->da_state;
-	int				error = 0;
+	struct xfs_da_state_blk		*blk;
+	int				error = 0, retval, done;
+
 	/*
-	 * If there is an out-of-line value, de-allocate the blocks.
-	 * This is done before we remove the attribute so that we don't
-	 * overflow the maximum size of a transaction and/or hit a deadlock.
+	 * If there is an out-of-line value, de-allocate the blocks.  This is
+	 * done before we remove the attribute so that we don't overflow the
+	 * maximum size of a transaction and/or hit a deadlock.
 	 */
 	if (args->rmtblkno > 0) {
-		/*
-		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
-		 */
-		error = xfs_attr_node_remove_rmt(dac, state);
+		error = xfs_bunmapi(args->trans, args->dp, args->rmtblkno,
+				args->rmtblkcnt, XFS_BMAPI_ATTRFORK, 1, &done);
+		if (error)
+			return error;
+		if (!done) {
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			return -EAGAIN;
+		}
+
+		error = xfs_attr_refillstate(state);
 		if (error)
 			return error;
 	}
 
+	/*
+	 * Remove the name and update the hashvals in the tree.
+	 */
+	blk = &state->path.blk[state->path.active-1];
+	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
+	retval = xfs_attr3_leaf_remove(blk->bp, args);
+	xfs_da3_fixhashpath(state, &state->path);
+
+	/*
+	 * Check to see if the tree needs to be collapsed. Set the flag to
+	 * indicate that the calling function needs to move the shrink
+	 * operation
+	 */
+	if (retval && (state->path.active > 1)) {
+		error = xfs_da3_join(state);
+		if (error)
+			return error;
+		*joined = true;
+	}
+
 	return error;
 }
 
@@ -1358,18 +1339,10 @@ xfs_attr_node_removename_iter(
 	struct xfs_delattr_context	*dac)
 {
 	struct xfs_da_args		*args = dac->da_args;
-	struct xfs_da_state		*state = NULL;
-	int				retval, error;
+	struct xfs_da_state		*state = dac->da_state;
+	int				error;
 	struct xfs_inode		*dp = args->dp;
-
-	trace_xfs_attr_node_removename(args);
-
-	if (!dac->da_state) {
-		error = xfs_attr_node_removename_setup(dac);
-		if (error)
-			goto out;
-	}
-	state = dac->da_state;
+	bool				joined = false;
 
 	switch (dac->dela_state) {
 	case XFS_DAS_UNINIT:
@@ -1377,27 +1350,14 @@ xfs_attr_node_removename_iter(
 		 * repeatedly remove remote blocks, remove the entry and join.
 		 * returns -EAGAIN or 0 for completion of the step.
 		 */
-		error = xfs_attr_node_remove_step(dac);
+		error = xfs_attr_node_remove_step(dac, &joined);
 		if (error)
-			break;
-
-		retval = xfs_attr_node_remove_cleanup(args, state);
-
-		/*
-		 * Check to see if the tree needs to be collapsed. Set the flag
-		 * to indicate that the calling function needs to move the
-		 * shrink operation
-		 */
-		if (retval && (state->path.active > 1)) {
-			error = xfs_da3_join(state);
-			if (error)
-				return error;
-
+			goto out;
+		if (joined) {
 			dac->flags |= XFS_DAC_DEFER_FINISH;
 			dac->dela_state = XFS_DAS_RM_SHRINK;
 			return -EAGAIN;
 		}
-
 		/* fallthrough */
 	case XFS_DAS_RM_SHRINK:
 		/*
@@ -1405,7 +1365,6 @@ xfs_attr_node_removename_iter(
 		 */
 		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
 			error = xfs_attr_node_shrink(args, state);
-
 		break;
 	default:
 		ASSERT(0);
@@ -1413,10 +1372,8 @@ xfs_attr_node_removename_iter(
 		goto out;
 	}
 
-	if (error == -EAGAIN)
-		return error;
 out:
-	if (state)
+	if (state && error != -EAGAIN)
 		xfs_da_state_free(state);
 	return error;
 }
Brian Foster Dec. 22, 2020, 5:20 p.m. UTC | #4
On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
> On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
> > This patch modifies the attr remove routines to be delay ready. This
> > means they no longer roll or commit transactions, but instead return
> > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > uses a sort of state machine like switch to keep track of where it was
> > when EAGAIN was returned. xfs_attr_node_removename has also been
> > modified to use the switch, and a new version of xfs_attr_remove_args
> > consists of a simple loop to refresh the transaction until the operation
> > is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > transaction where ever the existing code used to.
> > 
> > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > version __xfs_attr_rmtval_remove. We will rename
> > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > done.
> > 
> > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > during a rename).  For reasons of preserving existing function, we
> > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > used and will be removed.
> > 
> > This patch also adds a new struct xfs_delattr_context, which we will use
> > to keep track of the current state of an attribute operation. The new
> > xfs_delattr_state enum is used to track various operations that are in
> > progress so that we know not to repeat them, and resume where we left
> > off before EAGAIN was returned to cycle out the transaction. Other
> > members take the place of local variables that need to retain their
> > values across multiple function recalls.  See xfs_attr.h for a more
> > detailed diagram of the states.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> 
> I started with a couple small comments on this patch but inevitably
> started thinking more about the factoring again and ended up with a
> couple patches on top. The first is more of some small tweaks and
> open-coding that IMO makes this patch a bit easier to follow. The
> second is more of an RFC so I'll follow up with that in a second email.
> I'm curious what folks' thoughts might be on either. Also note that I'm
> primarily focusing on code structure and whatnot here, so these are fast
> and loose, compile tested only and likely to be broken.
> 

... and here's the second diff (applies on top of the first).

This one popped up after staring at the previous changes for a bit and
wondering whether using "done flags" might make the whole thing easier
to follow than incremental state transitions. I think the attr remove
path is easy enough to follow with either method, but the attr set path
is a beast and so this is more with that in mind. Initial thoughts?

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 2e466c4ac283..106e3c070131 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1271,14 +1271,12 @@ int xfs_attr_node_removename_setup(
  * successful error code is returned.
  */
 STATIC int
-xfs_attr_node_remove_step(
-	struct xfs_delattr_context	*dac,
-	bool				*joined)
+xfs_attr_node_remove_rmt_step(
+	struct xfs_delattr_context	*dac)
 {
 	struct xfs_da_args		*args = dac->da_args;
 	struct xfs_da_state		*state = dac->da_state;
-	struct xfs_da_state_blk		*blk;
-	int				error = 0, retval, done;
+	int				error, done;
 
 	/*
 	 * If there is an out-of-line value, de-allocate the blocks.  This is
@@ -1300,6 +1298,19 @@ xfs_attr_node_remove_step(
 			return error;
 	}
 
+	dac->dela_state |= XFS_DAS_RMT_DONE;
+	return error;
+}
+
+STATIC int
+xfs_attr_node_remove_join_step(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state = dac->da_state;
+	struct xfs_da_state_blk		*blk;
+	int				error, retval;
+
 	/*
 	 * Remove the name and update the hashvals in the tree.
 	 */
@@ -1317,9 +1328,12 @@ xfs_attr_node_remove_step(
 		error = xfs_da3_join(state);
 		if (error)
 			return error;
-		*joined = true;
+
+		error = -EAGAIN;
+		dac->flags |= XFS_DAC_DEFER_FINISH;
 	}
 
+	dac->dela_state |= XFS_DAS_JOIN_DONE;
 	return error;
 }
 
@@ -1342,36 +1356,23 @@ xfs_attr_node_removename_iter(
 	struct xfs_da_state		*state = dac->da_state;
 	int				error;
 	struct xfs_inode		*dp = args->dp;
-	bool				joined = false;
 
-	switch (dac->dela_state) {
-	case XFS_DAS_UNINIT:
-		/*
-		 * repeatedly remove remote blocks, remove the entry and join.
-		 * returns -EAGAIN or 0 for completion of the step.
-		 */
-		error = xfs_attr_node_remove_step(dac, &joined);
+	if (!(dac->dela_state & XFS_DAS_RMT_DONE)) {
+		error = xfs_attr_node_remove_rmt_step(dac);
 		if (error)
 			goto out;
-		if (joined) {
-			dac->flags |= XFS_DAC_DEFER_FINISH;
-			dac->dela_state = XFS_DAS_RM_SHRINK;
-			return -EAGAIN;
-		}
-		/* fallthrough */
-	case XFS_DAS_RM_SHRINK:
-		/*
-		 * If the result is small enough, push it all into the inode.
-		 */
-		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
-			error = xfs_attr_node_shrink(args, state);
-		break;
-	default:
-		ASSERT(0);
-		error = -EINVAL;
-		goto out;
 	}
 
+	if (!(dac->dela_state & XFS_DAS_JOIN_DONE)) {
+		error = xfs_attr_node_remove_join_step(dac);
+		if (error)
+			goto out;
+	}
+
+	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
+		error = xfs_attr_node_shrink(args, state);
+	ASSERT(error != -EAGAIN);
+
 out:
 	if (state && error != -EAGAIN)
 		xfs_da_state_free(state);
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3154ef4b7833..67e730cd3267 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -151,6 +151,9 @@ enum xfs_delattr_state {
 	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
 };
 
+#define XFS_DAS_RMT_DONE	0x1
+#define XFS_DAS_JOIN_DONE	0x2
+
 /*
  * Defines for xfs_delattr_context.flags
  */
Brian Foster Dec. 22, 2020, 6:44 p.m. UTC | #5
On Tue, Dec 22, 2020 at 12:20:20PM -0500, Brian Foster wrote:
> On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
> > On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
> > > This patch modifies the attr remove routines to be delay ready. This
> > > means they no longer roll or commit transactions, but instead return
> > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > uses a sort of state machine like switch to keep track of where it was
> > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > consists of a simple loop to refresh the transaction until the operation
> > > is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > transaction where ever the existing code used to.
> > > 
> > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > version __xfs_attr_rmtval_remove. We will rename
> > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > done.
> > > 
> > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > during a rename).  For reasons of preserving existing function, we
> > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > used and will be removed.
> > > 
> > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > to keep track of the current state of an attribute operation. The new
> > > xfs_delattr_state enum is used to track various operations that are in
> > > progress so that we know not to repeat them, and resume where we left
> > > off before EAGAIN was returned to cycle out the transaction. Other
> > > members take the place of local variables that need to retain their
> > > values across multiple function recalls.  See xfs_attr.h for a more
> > > detailed diagram of the states.
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > 
> > I started with a couple small comments on this patch but inevitably
> > started thinking more about the factoring again and ended up with a
> > couple patches on top. The first is more of some small tweaks and
> > open-coding that IMO makes this patch a bit easier to follow. The
> > second is more of an RFC so I'll follow up with that in a second email.
> > I'm curious what folks' thoughts might be on either. Also note that I'm
> > primarily focusing on code structure and whatnot here, so these are fast
> > and loose, compile tested only and likely to be broken.
> > 
> 
> ... and here's the second diff (applies on top of the first).
> 
> This one popped up after staring at the previous changes for a bit and
> wondering whether using "done flags" might make the whole thing easier
> to follow than incremental state transitions. I think the attr remove
> path is easy enough to follow with either method, but the attr set path
> is a beast and so this is more with that in mind. Initial thoughts?
> 

Eh, the more I stare at the attr set code I'm not sure this by itself is
much of an improvement. It helps in some areas, but there are so many
transaction rolls embedded throughout at different levels that a larger
rework of the code is probably still necessary. Anyways, this was just a
random thought for now..

Brian

> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 2e466c4ac283..106e3c070131 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -1271,14 +1271,12 @@ int xfs_attr_node_removename_setup(
>   * successful error code is returned.
>   */
>  STATIC int
> -xfs_attr_node_remove_step(
> -	struct xfs_delattr_context	*dac,
> -	bool				*joined)
> +xfs_attr_node_remove_rmt_step(
> +	struct xfs_delattr_context	*dac)
>  {
>  	struct xfs_da_args		*args = dac->da_args;
>  	struct xfs_da_state		*state = dac->da_state;
> -	struct xfs_da_state_blk		*blk;
> -	int				error = 0, retval, done;
> +	int				error, done;
>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.  This is
> @@ -1300,6 +1298,19 @@ xfs_attr_node_remove_step(
>  			return error;
>  	}
>  
> +	dac->dela_state |= XFS_DAS_RMT_DONE;
> +	return error;
> +}
> +
> +STATIC int
> +xfs_attr_node_remove_join_step(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state = dac->da_state;
> +	struct xfs_da_state_blk		*blk;
> +	int				error, retval;
> +
>  	/*
>  	 * Remove the name and update the hashvals in the tree.
>  	 */
> @@ -1317,9 +1328,12 @@ xfs_attr_node_remove_step(
>  		error = xfs_da3_join(state);
>  		if (error)
>  			return error;
> -		*joined = true;
> +
> +		error = -EAGAIN;
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>  	}
>  
> +	dac->dela_state |= XFS_DAS_JOIN_DONE;
>  	return error;
>  }
>  
> @@ -1342,36 +1356,23 @@ xfs_attr_node_removename_iter(
>  	struct xfs_da_state		*state = dac->da_state;
>  	int				error;
>  	struct xfs_inode		*dp = args->dp;
> -	bool				joined = false;
>  
> -	switch (dac->dela_state) {
> -	case XFS_DAS_UNINIT:
> -		/*
> -		 * repeatedly remove remote blocks, remove the entry and join.
> -		 * returns -EAGAIN or 0 for completion of the step.
> -		 */
> -		error = xfs_attr_node_remove_step(dac, &joined);
> +	if (!(dac->dela_state & XFS_DAS_RMT_DONE)) {
> +		error = xfs_attr_node_remove_rmt_step(dac);
>  		if (error)
>  			goto out;
> -		if (joined) {
> -			dac->flags |= XFS_DAC_DEFER_FINISH;
> -			dac->dela_state = XFS_DAS_RM_SHRINK;
> -			return -EAGAIN;
> -		}
> -		/* fallthrough */
> -	case XFS_DAS_RM_SHRINK:
> -		/*
> -		 * If the result is small enough, push it all into the inode.
> -		 */
> -		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> -			error = xfs_attr_node_shrink(args, state);
> -		break;
> -	default:
> -		ASSERT(0);
> -		error = -EINVAL;
> -		goto out;
>  	}
>  
> +	if (!(dac->dela_state & XFS_DAS_JOIN_DONE)) {
> +		error = xfs_attr_node_remove_join_step(dac);
> +		if (error)
> +			goto out;
> +	}
> +
> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> +		error = xfs_attr_node_shrink(args, state);
> +	ASSERT(error != -EAGAIN);
> +
>  out:
>  	if (state && error != -EAGAIN)
>  		xfs_da_state_free(state);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 3154ef4b7833..67e730cd3267 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -151,6 +151,9 @@ enum xfs_delattr_state {
>  	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>  };
>  
> +#define XFS_DAS_RMT_DONE	0x1
> +#define XFS_DAS_JOIN_DONE	0x2
> +
>  /*
>   * Defines for xfs_delattr_context.flags
>   */
>
Chandan Babu R Dec. 23, 2020, 4:05 a.m. UTC | #6
On Tue, 22 Dec 2020 08:41:49 -0700, Allison Henderson wrote:
> 
> On 12/22/20 12:22 AM, Chandan Babu R wrote:
> > On Fri, 18 Dec 2020 00:29:06 -0700, Allison Henderson wrote:
> >> This patch modifies the attr remove routines to be delay ready. This
> >> means they no longer roll or commit transactions, but instead return
> >> -EAGAIN to have the calling routine roll and refresh the transaction. In
> >> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> >> uses a sort of state machine like switch to keep track of where it was
> >> when EAGAIN was returned. xfs_attr_node_removename has also been
> >> modified to use the switch, and a new version of xfs_attr_remove_args
> >> consists of a simple loop to refresh the transaction until the operation
> >> is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> >> transaction where ever the existing code used to.
> >>
> >> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> >> version __xfs_attr_rmtval_remove. We will rename
> >> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> >> done.
> >>
> >> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> >> during a rename).  For reasons of preserving existing function, we
> >> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> >> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> >> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> >> used and will be removed.
> >>
> >> This patch also adds a new struct xfs_delattr_context, which we will use
> >> to keep track of the current state of an attribute operation. The new
> >> xfs_delattr_state enum is used to track various operations that are in
> >> progress so that we know not to repeat them, and resume where we left
> >> off before EAGAIN was returned to cycle out the transaction. Other
> >> members take the place of local variables that need to retain their
> >> values across multiple function recalls.  See xfs_attr.h for a more
> >> detailed diagram of the states.
> >>
> >> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> >> ---
> >>   fs/xfs/libxfs/xfs_attr.c        | 218 +++++++++++++++++++++++++++++-----------
> >>   fs/xfs/libxfs/xfs_attr.h        | 100 ++++++++++++++++++
> >>   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> >>   fs/xfs/libxfs/xfs_attr_remote.c |  48 +++++----
> >>   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> >>   fs/xfs/xfs_attr_inactive.c      |   2 +-
> >>   6 files changed, 288 insertions(+), 84 deletions(-)
> >>
> >> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> >> index 1969b88..b6330f9 100644
> >> --- a/fs/xfs/libxfs/xfs_attr.c
> >> +++ b/fs/xfs/libxfs/xfs_attr.c
> >> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> >>    */
> >>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> >>   STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> >> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> >> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> >>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> >>   				 struct xfs_da_state **state);
> >>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> >> @@ -264,6 +264,34 @@ xfs_attr_set_shortform(
> >>   }
> >>   
> >>   /*
> >> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> >> + * also checks for a defer finish.  Transaction is finished and rolled as
> >> + * needed, and returns true of false if the delayed operation should continue.
> >> + */
> >> +int
> >> +xfs_attr_trans_roll(
> >> +	struct xfs_delattr_context	*dac)
> >> +{
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	int				error;
> >> +
> >> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> >> +		/*
> >> +		 * The caller wants us to finish all the deferred ops so that we
> >> +		 * avoid pinning the log tail with a large number of deferred
> >> +		 * ops.
> >> +		 */
> >> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> >> +		error = xfs_defer_finish(&args->trans);
> >> +		if (error)
> >> +			return error;
> >> +	} else
> >> +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> >> +
> >> +	return error;
> >> +}
> >> +
> >> +/*
> >>    * Set the attribute specified in @args.
> >>    */
> >>   int
> >> @@ -364,23 +392,58 @@ xfs_has_attr(
> >>    */
> >>   int
> >>   xfs_attr_remove_args(
> >> -	struct xfs_da_args      *args)
> >> +	struct xfs_da_args	*args)
> >>   {
> >> -	struct xfs_inode	*dp = args->dp;
> >> -	int			error;
> >> +	int				error;
> >> +	struct xfs_delattr_context	dac = {
> >> +		.da_args	= args,
> >> +	};
> >> +
> >> +	do {
> >> +		error = xfs_attr_remove_iter(&dac);
> >> +		if (error != -EAGAIN)
> >> +			break;
> >> +
> >> +		error = xfs_attr_trans_roll(&dac);
> >> +		if (error)
> >> +			return error;
> >> +
> >> +	} while (true);
> >> +
> >> +	return error;
> >> +}
> >>   
> >> -	if (!xfs_inode_hasattr(dp)) {
> >> -		error = -ENOATTR;
> >> -	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> >> +/*
> >> + * Remove the attribute specified in @args.
> >> + *
> >> + * This function may return -EAGAIN to signal that the transaction needs to be
> >> + * rolled.  Callers should continue calling this function until they receive a
> >> + * return value other than -EAGAIN.
> >> + */
> >> +int
> >> +xfs_attr_remove_iter(
> >> +	struct xfs_delattr_context	*dac)
> >> +{
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	struct xfs_inode		*dp = args->dp;
> >> +
> >> +	/* If we are shrinking a node, resume shrink */
> >> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> >> +		goto node;
> >> +
> >> +	if (!xfs_inode_hasattr(dp))
> >> +		return -ENOATTR;
> >> +
> >> +	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> >>   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> >> -		error = xfs_attr_shortform_remove(args);
> >> -	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> >> -		error = xfs_attr_leaf_removename(args);
> >> -	} else {
> >> -		error = xfs_attr_node_removename(args);
> >> +		return xfs_attr_shortform_remove(args);
> >>   	}
> >>   
> >> -	return error;
> >> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> >> +		return xfs_attr_leaf_removename(args);
> >> +node:
> >> +	/* If we are not short form or leaf, then proceed to remove node */
> >> +	return  xfs_attr_node_removename_iter(dac);
> >>   }
> >>   
> >>   /*
> >> @@ -1178,10 +1241,11 @@ xfs_attr_leaf_mark_incomplete(
> >>    */
> >>   STATIC
> >>   int xfs_attr_node_removename_setup(
> >> -	struct xfs_da_args	*args,
> >> -	struct xfs_da_state	**state)
> >> +	struct xfs_delattr_context	*dac)
> >>   {
> > 
> > In xfs_attr_node_removename_setup(), if either of
> > xfs_attr_leaf_mark_incomplete() or xfs_attr_rmtval_invalidate() returns with a
> > non-zero value, the memory pointed to by dac->da_state is not freed. This
> > happens because the caller (i.e. xfs_attr_node_removename_iter()) checks for
> > the non-NULL value of its local variable "state" to actually free the
> > corresponding memory.
> > 
> Ok, for this one it think it makes more sense to put an extra free in 
> the helper rather than have the caller handle it.  Will fix.
> 
> Do you have a tool thats tracing this out, or is it just by hand? 
> Because if it's a tool, I should probably be using it too :-)
>

Unfortunately, I found this by reading through the code changes. Tools to
figure these out would be great to have since it would let us focus mostly on
the larger picture.

> Thanks!
> Allison
> 
> 
> >> -	int			error;
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	struct xfs_da_state		**state = &dac->da_state;
> >> +	int				error;
> >>   
> >>   	error = xfs_attr_node_hasname(args, state);
> >>   	if (error != -EEXIST)
> >> @@ -1203,13 +1267,16 @@ int xfs_attr_node_removename_setup(
> >>   }
> >>   
> >>   STATIC int
> >> -xfs_attr_node_remove_rmt(
> >> -	struct xfs_da_args	*args,
> >> -	struct xfs_da_state	*state)
> >> +xfs_attr_node_remove_rmt (
> >> +	struct xfs_delattr_context	*dac,
> >> +	struct xfs_da_state		*state)
> >>   {
> >> -	int			error = 0;
> >> +	int				error = 0;
> >>   
> >> -	error = xfs_attr_rmtval_remove(args);
> >> +	/*
> >> +	 * May return -EAGAIN to request that the caller recall this function
> >> +	 */
> >> +	error = __xfs_attr_rmtval_remove(dac);
> >>   	if (error)
> >>   		return error;
> >>   
> >> @@ -1240,28 +1307,34 @@ xfs_attr_node_remove_cleanup(
> >>   }
> >>   
> >>   /*
> >> - * Remove a name from a B-tree attribute list.
> >> + * Step through removeing a name from a B-tree attribute list.
> >>    *
> >>    * This will involve walking down the Btree, and may involve joining
> >>    * leaf nodes and even joining intermediate nodes up to and including
> >>    * the root node (a special case of an intermediate node).
> >> + *
> >> + * This routine is meant to function as either an inline or delayed operation,
> >> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> >> + * functions will need to handle this, and recall the function until a
> >> + * successful error code is returned.
> >>    */
> >>   STATIC int
> >>   xfs_attr_node_remove_step(
> >> -	struct xfs_da_args	*args,
> >> -	struct xfs_da_state	*state)
> >> +	struct xfs_delattr_context	*dac)
> >>   {
> >> -	int			error;
> >> -	struct xfs_inode	*dp = args->dp;
> >> -
> >> -
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	struct xfs_da_state		*state = dac->da_state;
> >> +	int				error = 0;
> >>   	/*
> >>   	 * If there is an out-of-line value, de-allocate the blocks.
> >>   	 * This is done before we remove the attribute so that we don't
> >>   	 * overflow the maximum size of a transaction and/or hit a deadlock.
> >>   	 */
> >>   	if (args->rmtblkno > 0) {
> >> -		error = xfs_attr_node_remove_rmt(args, state);
> >> +		/*
> >> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> >> +		 */
> >> +		error = xfs_attr_node_remove_rmt(dac, state);
> >>   		if (error)
> >>   			return error;
> >>   	}
> >> @@ -1274,51 +1347,74 @@ xfs_attr_node_remove_step(
> >>    *
> >>    * This routine will find the blocks of the name to remove, remove them and
> >>    * shrink the tree if needed.
> >> + *
> >> + * This routine is meant to function as either an inline or delayed operation,
> >> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> >> + * functions will need to handle this, and recall the function until a
> >> + * successful error code is returned.
> >>    */
> >>   STATIC int
> >> -xfs_attr_node_removename(
> >> -	struct xfs_da_args	*args)
> >> +xfs_attr_node_removename_iter(
> >> +	struct xfs_delattr_context	*dac)
> >>   {
> >> -	struct xfs_da_state	*state = NULL;
> >> -	int			retval, error;
> >> -	struct xfs_inode	*dp = args->dp;
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	struct xfs_da_state		*state = NULL;
> >> +	int				retval, error;
> >> +	struct xfs_inode		*dp = args->dp;
> >>   
> >>   	trace_xfs_attr_node_removename(args);
> >>   
> >> -	error = xfs_attr_node_removename_setup(args, &state);
> >> -	if (error)
> >> -		goto out;
> >> +	if (!dac->da_state) {
> >> +		error = xfs_attr_node_removename_setup(dac);
> >> +		if (error)
> >> +			goto out;
> >> +	}
> >> +	state = dac->da_state;
> >>   
> >> -	error = xfs_attr_node_remove_step(args, state);
> >> -	if (error)
> >> -		goto out;
> >> +	switch (dac->dela_state) {
> >> +	case XFS_DAS_UNINIT:
> >> +		/*
> >> +		 * repeatedly remove remote blocks, remove the entry and join.
> >> +		 * returns -EAGAIN or 0 for completion of the step.
> >> +		 */
> >> +		error = xfs_attr_node_remove_step(dac);
> >> +		if (error)
> >> +			break;
> >>   
> >> -	retval = xfs_attr_node_remove_cleanup(args, state);
> >> +		retval = xfs_attr_node_remove_cleanup(args, state);
> >>   
> >> -	/*
> >> -	 * Check to see if the tree needs to be collapsed.
> >> -	 */
> >> -	if (retval && (state->path.active > 1)) {
> >> -		error = xfs_da3_join(state);
> >> -		if (error)
> >> -			return error;
> >> -		error = xfs_defer_finish(&args->trans);
> >> -		if (error)
> >> -			return error;
> >>   		/*
> >> -		 * Commit the Btree join operation and start a new trans.
> >> +		 * Check to see if the tree needs to be collapsed. Set the flag
> >> +		 * to indicate that the calling function needs to move the
> >> +		 * shrink operation
> >>   		 */
> >> -		error = xfs_trans_roll_inode(&args->trans, dp);
> >> -		if (error)
> >> -			return error;
> >> -	}
> >> +		if (retval && (state->path.active > 1)) {
> >> +			error = xfs_da3_join(state);
> >> +			if (error)
> >> +				return error;
> >>   
> >> -	/*
> >> -	 * If the result is small enough, push it all into the inode.
> >> -	 */
> >> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> >> -		error = xfs_attr_node_shrink(args, state);
> >> +			dac->flags |= XFS_DAC_DEFER_FINISH;
> >> +			dac->dela_state = XFS_DAS_RM_SHRINK;
> >> +			return -EAGAIN;
> >> +		}
> >> +
> >> +		/* fallthrough */
> >> +	case XFS_DAS_RM_SHRINK:
> >> +		/*
> >> +		 * If the result is small enough, push it all into the inode.
> >> +		 */
> >> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> >> +			error = xfs_attr_node_shrink(args, state);
> >> +
> >> +		break;
> >> +	default:
> >> +		ASSERT(0);
> >> +		error = -EINVAL;
> >> +		goto out;
> >> +	}
> >>   
> >> +	if (error == -EAGAIN)
> >> +		return error;
> >>   out:
> >>   	if (state)
> >>   		xfs_da_state_free(state);
> >> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> >> index 3e97a93..3154ef4 100644
> >> --- a/fs/xfs/libxfs/xfs_attr.h
> >> +++ b/fs/xfs/libxfs/xfs_attr.h
> >> @@ -74,6 +74,102 @@ struct xfs_attr_list_context {
> >>   };
> >>   
> >>   
> >> +/*
> >> + * ========================================================================
> >> + * Structure used to pass context around among the delayed routines.
> >> + * ========================================================================
> >> + */
> >> +
> >> +/*
> >> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> >> + * states indicate places where the function would return -EAGAIN, and then
> >> + * immediately resume from after being recalled by the calling function. States
> >> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> >> + * so the calling function needs to pass them back to that subroutine to allow
> >> + * it to finish where it left off. But they otherwise do not have a role in the
> >> + * calling function other than just passing through.
> >> + *
> >> + * xfs_attr_remove_iter()
> >> + *              │
> >> + *              v
> >> + *        found attr blks? ───n──┐
> >> + *              │                v
> >> + *              │         find and invalidate
> >> + *              y         the blocks. mark
> >> + *              │         attr incomplete
> >> + *              ├────────────────┘
> >> + *              │
> >> + *              v
> >> + *      remove a block with
> >> + *    xfs_attr_node_remove_step <────┐
> >> + *              │                    │
> >> + *              v                    │
> >> + *      still have blks ──y──> return -EAGAIN.
> >> + *        to remove?          re-enter with one
> >> + *              │            less blk to remove
> >> + *              n
> >> + *              │
> >> + *              v
> >> + *       remove leaf and
> >> + *       update hash with
> >> + *   xfs_attr_node_remove_cleanup
> >> + *              │
> >> + *              v
> >> + *           need to
> >> + *        shrink tree? ─n─┐
> >> + *              │         │
> >> + *              y         │
> >> + *              │         │
> >> + *              v         │
> >> + *          join leaf     │
> >> + *              │         │
> >> + *              v         │
> >> + *      XFS_DAS_RM_SHRINK │
> >> + *              │         │
> >> + *              v         │
> >> + *       do the shrink    │
> >> + *              │         │
> >> + *              v         │
> >> + *          free state <──┘
> >> + *              │
> >> + *              v
> >> + *            done
> >> + *
> >> + */
> >> +
> >> +/*
> >> + * Enum values for xfs_delattr_context.da_state
> >> + *
> >> + * These values are used by delayed attribute operations to keep track  of where
> >> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> >> + * calling function to roll the transaction, and then recall the subroutine to
> >> + * finish the operation.  The enum is then used by the subroutine to jump back
> >> + * to where it was and resume executing where it left off.
> >> + */
> >> +enum xfs_delattr_state {
> >> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> >> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> >> +};
> >> +
> >> +/*
> >> + * Defines for xfs_delattr_context.flags
> >> + */
> >> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> >> +
> >> +/*
> >> + * Context used for keeping track of delayed attribute operations
> >> + */
> >> +struct xfs_delattr_context {
> >> +	struct xfs_da_args      *da_args;
> >> +
> >> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> >> +	struct xfs_da_state     *da_state;
> >> +
> >> +	/* Used to keep track of current state of delayed operation */
> >> +	unsigned int            flags;
> >> +	enum xfs_delattr_state  dela_state;
> >> +};
> >> +
> >>   /*========================================================================
> >>    * Function prototypes for the kernel.
> >>    *========================================================================*/
> >> @@ -91,6 +187,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> >>   int xfs_attr_set_args(struct xfs_da_args *args);
> >>   int xfs_has_attr(struct xfs_da_args *args);
> >>   int xfs_attr_remove_args(struct xfs_da_args *args);
> >> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> >> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> >>   bool xfs_attr_namecheck(const void *name, size_t length);
> >> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> >> +			      struct xfs_da_args *args);
> >>   
> >>   #endif	/* __XFS_ATTR_H__ */
> >> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> >> index d6ef69a..3780141 100644
> >> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> >> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> >> @@ -19,8 +19,8 @@
> >>   #include "xfs_bmap_btree.h"
> >>   #include "xfs_bmap.h"
> >>   #include "xfs_attr_sf.h"
> >> -#include "xfs_attr_remote.h"
> >>   #include "xfs_attr.h"
> >> +#include "xfs_attr_remote.h"
> >>   #include "xfs_attr_leaf.h"
> >>   #include "xfs_error.h"
> >>   #include "xfs_trace.h"
> >> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> >> index 48d8e9c..f09820c 100644
> >> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> >> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> >> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
> >>    */
> >>   int
> >>   xfs_attr_rmtval_remove(
> >> -	struct xfs_da_args      *args)
> >> +	struct xfs_da_args		*args)
> >>   {
> >> -	int			error;
> >> -	int			retval;
> >> +	int				error;
> >> +	struct xfs_delattr_context	dac  = {
> >> +		.da_args	= args,
> >> +	};
> >>   
> >>   	trace_xfs_attr_rmtval_remove(args);
> >>   
> >> @@ -685,31 +687,29 @@ xfs_attr_rmtval_remove(
> >>   	 * Keep de-allocating extents until the remote-value region is gone.
> >>   	 */
> >>   	do {
> >> -		retval = __xfs_attr_rmtval_remove(args);
> >> -		if (retval && retval != -EAGAIN)
> >> -			return retval;
> >> +		error = __xfs_attr_rmtval_remove(&dac);
> >> +		if (error != -EAGAIN)
> >> +			break;
> >>   
> >> -		/*
> >> -		 * Close out trans and start the next one in the chain.
> >> -		 */
> >> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> >> +		error = xfs_attr_trans_roll(&dac);
> >>   		if (error)
> >>   			return error;
> >> -	} while (retval == -EAGAIN);
> >> +	} while (true);
> >>   
> >> -	return 0;
> >> +	return error;
> >>   }
> >>   
> >>   /*
> >>    * Remove the value associated with an attribute by deleting the out-of-line
> >> - * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
> >> + * buffer that it is stored on. Returns -EAGAIN for the caller to refresh the
> >>    * transaction and re-call the function
> >>    */
> >>   int
> >>   __xfs_attr_rmtval_remove(
> >> -	struct xfs_da_args	*args)
> >> +	struct xfs_delattr_context	*dac)
> >>   {
> >> -	int			error, done;
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	int				error, done;
> >>   
> >>   	/*
> >>   	 * Unmap value blocks for this attr.
> >> @@ -719,12 +719,20 @@ __xfs_attr_rmtval_remove(
> >>   	if (error)
> >>   		return error;
> >>   
> >> -	error = xfs_defer_finish(&args->trans);
> >> -	if (error)
> >> -		return error;
> >> -
> >> -	if (!done)
> >> +	/*
> >> +	 * We dont need an explicit state here to pick up where we left off.  We
> >> +	 * can figure it out using the !done return code.  Calling function only
> >> +	 * needs to keep recalling this routine until we indicate to stop by
> >> +	 * returning anything other than -EAGAIN. The actual value of
> >> +	 * attr->xattri_dela_state may be some value reminicent of the calling
> >> +	 * function, but it's value is irrelevant with in the context of this
> >> +	 * function.  Once we are done here, the next state is set as needed
> >> +	 * by the parent
> >> +	 */
> >> +	if (!done) {
> >> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> >>   		return -EAGAIN;
> >> +	}
> >>   
> >>   	return error;
> >>   }
> >> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> >> index 9eee615..002fd30 100644
> >> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> >> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> >> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> >>   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> >>   		xfs_buf_flags_t incore_flags);
> >>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> >> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> >> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> >>   #endif /* __XFS_ATTR_REMOTE_H__ */
> >> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> >> index bfad669..aaa7e66 100644
> >> --- a/fs/xfs/xfs_attr_inactive.c
> >> +++ b/fs/xfs/xfs_attr_inactive.c
> >> @@ -15,10 +15,10 @@
> >>   #include "xfs_da_format.h"
> >>   #include "xfs_da_btree.h"
> >>   #include "xfs_inode.h"
> >> +#include "xfs_attr.h"
> >>   #include "xfs_attr_remote.h"
> >>   #include "xfs_trans.h"
> >>   #include "xfs_bmap.h"
> >> -#include "xfs_attr.h"
> >>   #include "xfs_attr_leaf.h"
> >>   #include "xfs_quota.h"
> >>   #include "xfs_dir2.h"
> >>
> > 
> > 
>
Allison Henderson Dec. 23, 2020, 5:20 a.m. UTC | #7
On 12/22/20 11:44 AM, Brian Foster wrote:
> On Tue, Dec 22, 2020 at 12:20:20PM -0500, Brian Foster wrote:
>> On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
>>> On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
>>>> This patch modifies the attr remove routines to be delay ready. This
>>>> means they no longer roll or commit transactions, but instead return
>>>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>>>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>>>> uses a sort of state machine like switch to keep track of where it was
>>>> when EAGAIN was returned. xfs_attr_node_removename has also been
>>>> modified to use the switch, and a new version of xfs_attr_remove_args
>>>> consists of a simple loop to refresh the transaction until the operation
>>>> is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
>>>> transaction where ever the existing code used to.
>>>>
>>>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>>>> version __xfs_attr_rmtval_remove. We will rename
>>>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>>>> done.
>>>>
>>>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>>>> during a rename).  For reasons of preserving existing function, we
>>>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>>>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>>>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>>>> used and will be removed.
>>>>
>>>> This patch also adds a new struct xfs_delattr_context, which we will use
>>>> to keep track of the current state of an attribute operation. The new
>>>> xfs_delattr_state enum is used to track various operations that are in
>>>> progress so that we know not to repeat them, and resume where we left
>>>> off before EAGAIN was returned to cycle out the transaction. Other
>>>> members take the place of local variables that need to retain their
>>>> values across multiple function recalls.  See xfs_attr.h for a more
>>>> detailed diagram of the states.
>>>>
>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>> ---
>>>
>>> I started with a couple small comments on this patch but inevitably
>>> started thinking more about the factoring again and ended up with a
>>> couple patches on top. The first is more of some small tweaks and
>>> open-coding that IMO makes this patch a bit easier to follow. The
>>> second is more of an RFC so I'll follow up with that in a second email.
>>> I'm curious what folks' thoughts might be on either. Also note that I'm
>>> primarily focusing on code structure and whatnot here, so these are fast
>>> and loose, compile tested only and likely to be broken.
>>>
>>
>> ... and here's the second diff (applies on top of the first).
>>
>> This one popped up after staring at the previous changes for a bit and
>> wondering whether using "done flags" might make the whole thing easier
>> to follow than incremental state transitions. I think the attr remove
>> path is easy enough to follow with either method, but the attr set path
>> is a beast and so this is more with that in mind. Initial thoughts?
>>
> 
> Eh, the more I stare at the attr set code I'm not sure this by itself is
> much of an improvement. It helps in some areas, but there are so many
> transaction rolls embedded throughout at different levels that a larger
> rework of the code is probably still necessary. Anyways, this was just a
> random thought for now..
> 
> Brian

No worries, I know the feeling :-)  The set works and all, but I do 
think there is struggle around trying to find a particularly pleasent 
looking presentation of it.  Especially when we get into the set path, 
it's a bit more complex.  I may pick through the patches you habe here 
and pick up the whitespace cleanups and other style adjustments if 
people prefer it that way.  The good news is, a lot of the *_args 
routines are supposed to disappear at the end of the set, so there's not 
really a need to invest too much in them I suppose. It may help to jump 
to the "Set up infastructure" patch too.  I've expanded the diagram to 
try and help illustrait the code flow a bit, so that may help with 
following the code flow.

Allison

> 
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 2e466c4ac283..106e3c070131 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -1271,14 +1271,12 @@ int xfs_attr_node_removename_setup(
>>    * successful error code is returned.
>>    */
>>   STATIC int
>> -xfs_attr_node_remove_step(
>> -	struct xfs_delattr_context	*dac,
>> -	bool				*joined)
>> +xfs_attr_node_remove_rmt_step(
>> +	struct xfs_delattr_context	*dac)
>>   {
>>   	struct xfs_da_args		*args = dac->da_args;
>>   	struct xfs_da_state		*state = dac->da_state;
>> -	struct xfs_da_state_blk		*blk;
>> -	int				error = 0, retval, done;
>> +	int				error, done;
>>   
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.  This is
>> @@ -1300,6 +1298,19 @@ xfs_attr_node_remove_step(
>>   			return error;
>>   	}
>>   
>> +	dac->dela_state |= XFS_DAS_RMT_DONE;
>> +	return error;
>> +}
>> +
>> +STATIC int
>> +xfs_attr_node_remove_join_step(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state = dac->da_state;
>> +	struct xfs_da_state_blk		*blk;
>> +	int				error, retval;
>> +
>>   	/*
>>   	 * Remove the name and update the hashvals in the tree.
>>   	 */
>> @@ -1317,9 +1328,12 @@ xfs_attr_node_remove_step(
>>   		error = xfs_da3_join(state);
>>   		if (error)
>>   			return error;
>> -		*joined = true;
>> +
>> +		error = -EAGAIN;
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   	}
>>   
>> +	dac->dela_state |= XFS_DAS_JOIN_DONE;
>>   	return error;
>>   }
>>   
>> @@ -1342,36 +1356,23 @@ xfs_attr_node_removename_iter(
>>   	struct xfs_da_state		*state = dac->da_state;
>>   	int				error;
>>   	struct xfs_inode		*dp = args->dp;
>> -	bool				joined = false;
>>   
>> -	switch (dac->dela_state) {
>> -	case XFS_DAS_UNINIT:
>> -		/*
>> -		 * repeatedly remove remote blocks, remove the entry and join.
>> -		 * returns -EAGAIN or 0 for completion of the step.
>> -		 */
>> -		error = xfs_attr_node_remove_step(dac, &joined);
>> +	if (!(dac->dela_state & XFS_DAS_RMT_DONE)) {
>> +		error = xfs_attr_node_remove_rmt_step(dac);
>>   		if (error)
>>   			goto out;
>> -		if (joined) {
>> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>> -			dac->dela_state = XFS_DAS_RM_SHRINK;
>> -			return -EAGAIN;
>> -		}
>> -		/* fallthrough */
>> -	case XFS_DAS_RM_SHRINK:
>> -		/*
>> -		 * If the result is small enough, push it all into the inode.
>> -		 */
>> -		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> -			error = xfs_attr_node_shrink(args, state);
>> -		break;
>> -	default:
>> -		ASSERT(0);
>> -		error = -EINVAL;
>> -		goto out;
>>   	}
>>   
>> +	if (!(dac->dela_state & XFS_DAS_JOIN_DONE)) {
>> +		error = xfs_attr_node_remove_join_step(dac);
>> +		if (error)
>> +			goto out;
>> +	}
>> +
>> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> +		error = xfs_attr_node_shrink(args, state);
>> +	ASSERT(error != -EAGAIN);
>> +
>>   out:
>>   	if (state && error != -EAGAIN)
>>   		xfs_da_state_free(state);
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 3154ef4b7833..67e730cd3267 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -151,6 +151,9 @@ enum xfs_delattr_state {
>>   	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>>   };
>>   
>> +#define XFS_DAS_RMT_DONE	0x1
>> +#define XFS_DAS_JOIN_DONE	0x2
>> +
>>   /*
>>    * Defines for xfs_delattr_context.flags
>>    */
>>
>
Brian Foster Dec. 23, 2020, 2:16 p.m. UTC | #8
On Tue, Dec 22, 2020 at 10:20:16PM -0700, Allison Henderson wrote:
> 
> 
> On 12/22/20 11:44 AM, Brian Foster wrote:
> > On Tue, Dec 22, 2020 at 12:20:20PM -0500, Brian Foster wrote:
> > > On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
> > > > On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
> > > > > This patch modifies the attr remove routines to be delay ready. This
> > > > > means they no longer roll or commit transactions, but instead return
> > > > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > > > uses a sort of state machine like switch to keep track of where it was
> > > > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > > > consists of a simple loop to refresh the transaction until the operation
> > > > > is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > > > transaction where ever the existing code used to.
> > > > > 
> > > > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > > > version __xfs_attr_rmtval_remove. We will rename
> > > > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > > > done.
> > > > > 
> > > > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > > > during a rename).  For reasons of preserving existing function, we
> > > > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > > > used and will be removed.
> > > > > 
> > > > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > > > to keep track of the current state of an attribute operation. The new
> > > > > xfs_delattr_state enum is used to track various operations that are in
> > > > > progress so that we know not to repeat them, and resume where we left
> > > > > off before EAGAIN was returned to cycle out the transaction. Other
> > > > > members take the place of local variables that need to retain their
> > > > > values across multiple function recalls.  See xfs_attr.h for a more
> > > > > detailed diagram of the states.
> > > > > 
> > > > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > > > ---
> > > > 
> > > > I started with a couple small comments on this patch but inevitably
> > > > started thinking more about the factoring again and ended up with a
> > > > couple patches on top. The first is more of some small tweaks and
> > > > open-coding that IMO makes this patch a bit easier to follow. The
> > > > second is more of an RFC so I'll follow up with that in a second email.
> > > > I'm curious what folks' thoughts might be on either. Also note that I'm
> > > > primarily focusing on code structure and whatnot here, so these are fast
> > > > and loose, compile tested only and likely to be broken.
> > > > 
> > > 
> > > ... and here's the second diff (applies on top of the first).
> > > 
> > > This one popped up after staring at the previous changes for a bit and
> > > wondering whether using "done flags" might make the whole thing easier
> > > to follow than incremental state transitions. I think the attr remove
> > > path is easy enough to follow with either method, but the attr set path
> > > is a beast and so this is more with that in mind. Initial thoughts?
> > > 
> > 
> > Eh, the more I stare at the attr set code I'm not sure this by itself is
> > much of an improvement. It helps in some areas, but there are so many
> > transaction rolls embedded throughout at different levels that a larger
> > rework of the code is probably still necessary. Anyways, this was just a
> > random thought for now..
> > 
> > Brian
> 
> No worries, I know the feeling :-)  The set works and all, but I do think
> there is struggle around trying to find a particularly pleasent looking
> presentation of it.  Especially when we get into the set path, it's a bit
> more complex.  I may pick through the patches you habe here and pick up the
> whitespace cleanups and other style adjustments if people prefer it that
> way.  The good news is, a lot of the *_args routines are supposed to
> disappear at the end of the set, so there's not really a need to invest too
> much in them I suppose. It may help to jump to the "Set up infastructure"
> patch too.  I've expanded the diagram to try and help illustrait the code
> flow a bit, so that may help with following the code flow.
> 

I'm sure.. :P Note that the first patch was more smaller tweaks and
refactoring with the existing model in mind. For the set path, the
challenge IMO is to make the code generally more readable. I think the
remove path accomplishes this for the most part because the states and
whatnot are fairly low overhead on top of the existing complexity. This
changes considerably for the set path, not so much due to the mechanism
but because the baseline code is so fragmented and complex from the
start. I am slightly concerned that bolting state management onto the
current code as such might make it harder to grok and clean up after the
fact, but I could be wrong about that (my hope was certainly for the
opposite).

Regardless, that had me shifting focus a bit and playing around with the
current upstream code as opposed to shifting around your code. ISTM that
there is some commonality across the various set codepaths and perhaps
there is potential to simplify things notably _before_ applying the
state management scheme. I've appended a new diff below (based on
for-next) that starts to demonstrate what I mean. Note again that this
is similarly fast and loose as I've knowingly threw away some quirks of
the code (i.e. leaf buffer bhold) for the purpose of quickly trying to
explore/POC whether the factoring might be sane and plausible.

In summary, this combines the "try addname" part of each xattr format to
fall under a single transaction rolling loop such that I think the
resulting function could become one high level state. I ran out of time
for working through the rest, but from a read through it seems there's
at least a chance we could continue with similar refactoring and
reduction to a fewer number of generic states (vs. more format-specific
states). For example, the remaining parts of the set operation all seem
to have something along the lines of the following high level
components:

- remote value block allocation (and value set)
- if rename == true, clear flag and done
- if rename == false, flip flags
	- remove old xattr (i.e., similar to xattr remove)

... where much of that code looks remarkably similar across the
different leaf/node code branches. So I'm curious what you and others
following along might think about something like this as an intermediate
step...

Brian

--- 8< ---

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index fd8e6418a0d3..eff8833d5303 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -58,6 +58,8 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
+STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *, struct xfs_buf *);
+STATIC int xfs_attr_node_addname_work(struct xfs_da_args *);
 
 int
 xfs_inode_hasattr(
@@ -216,116 +218,93 @@ xfs_attr_is_shortform(
 		ip->i_afp->if_nextents == 0);
 }
 
-/*
- * Attempts to set an attr in shortform, or converts short form to leaf form if
- * there is not enough room.  If the attr is set, the transaction is committed
- * and set to NULL.
- */
-STATIC int
-xfs_attr_set_shortform(
+int
+xfs_attr_set_fmt(
 	struct xfs_da_args	*args,
-	struct xfs_buf		**leaf_bp)
+	bool			*done)
 {
 	struct xfs_inode	*dp = args->dp;
-	int			error, error2 = 0;
+	struct xfs_buf		*leaf_bp = NULL;
+	int			error = 0;
 
-	/*
-	 * Try to add the attr to the attribute list in the inode.
-	 */
-	error = xfs_attr_try_sf_addname(dp, args);
-	if (error != -ENOSPC) {
-		error2 = xfs_trans_commit(args->trans);
-		args->trans = NULL;
-		return error ? error : error2;
+	if (xfs_attr_is_shortform(dp)) {
+		error = xfs_attr_try_sf_addname(dp, args);
+		if (!error)
+			*done = true;
+		if (error != -ENOSPC)
+			return error;
+
+		error = xfs_attr_shortform_to_leaf(args, &leaf_bp);
+		if (error)
+			return error;
+		return -EAGAIN;
 	}
-	/*
-	 * It won't fit in the shortform, transform to a leaf block.  GROT:
-	 * another possible req'mt for a double-split btree op.
-	 */
-	error = xfs_attr_shortform_to_leaf(args, leaf_bp);
-	if (error)
-		return error;
 
-	/*
-	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
-	 * push cannot grab the half-baked leaf buffer and run into problems
-	 * with the write verifier. Once we're done rolling the transaction we
-	 * can release the hold and add the attr to the leaf.
-	 */
-	xfs_trans_bhold(args->trans, *leaf_bp);
-	error = xfs_defer_finish(&args->trans);
-	xfs_trans_bhold_release(args->trans, *leaf_bp);
-	if (error) {
-		xfs_trans_brelse(args->trans, *leaf_bp);
-		return error;
+	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
+		struct xfs_buf	*bp = NULL;
+
+		error = xfs_attr_leaf_try_add(args, bp);
+		if (error != -ENOSPC)
+			return error;
+
+		error = xfs_attr3_leaf_to_node(args);
+		if (error)
+			return error;
+		return -EAGAIN;
 	}
 
-	return 0;
+	return xfs_attr_node_addname(args);
 }
 
 /*
  * Set the attribute specified in @args.
  */
 int
-xfs_attr_set_args(
+__xfs_attr_set_args(
 	struct xfs_da_args	*args)
 {
 	struct xfs_inode	*dp = args->dp;
-	struct xfs_buf          *leaf_bp = NULL;
 	int			error = 0;
 
-	/*
-	 * If the attribute list is already in leaf format, jump straight to
-	 * leaf handling.  Otherwise, try to add the attribute to the shortform
-	 * list; if there's no room then convert the list to leaf format and try
-	 * again.
-	 */
-	if (xfs_attr_is_shortform(dp)) {
-
-		/*
-		 * If the attr was successfully set in shortform, the
-		 * transaction is committed and set to NULL.  Otherwise, is it
-		 * converted from shortform to leaf, and the transaction is
-		 * retained.
-		 */
-		error = xfs_attr_set_shortform(args, &leaf_bp);
-		if (error || !args->trans)
-			return error;
-	}
-
 	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
 		error = xfs_attr_leaf_addname(args);
-		if (error != -ENOSPC)
-			return error;
-
-		/*
-		 * Promote the attribute list to the Btree format.
-		 */
-		error = xfs_attr3_leaf_to_node(args);
 		if (error)
 			return error;
+	}
+
+	error = xfs_attr_node_addname_work(args);
+	return error;
+}
+
+int
+xfs_attr_set_args(
+	struct xfs_da_args	*args)
+
+{
+	int			error;
+	bool			done = false;
+
+	do {
+		error = xfs_attr_set_fmt(args, &done);
+		if (error != -EAGAIN)
+			break;
 
-		/*
-		 * Finish any deferred work items and roll the transaction once
-		 * more.  The goal here is to call node_addname with the inode
-		 * and transaction in the same state (inode locked and joined,
-		 * transaction clean) no matter how we got to this step.
-		 */
 		error = xfs_defer_finish(&args->trans);
 		if (error)
-			return error;
+			break;
+		error = xfs_trans_roll_inode(&args->trans, args->dp);
+	} while (!error);
 
-		/*
-		 * Commit the current trans (including the inode) and
-		 * start a new one.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, dp);
-		if (error)
-			return error;
-	}
+	if (error || done)
+		return error;
 
-	error = xfs_attr_node_addname(args);
-	return error;
+	error = xfs_defer_finish(&args->trans);
+	if (!error)
+		error = xfs_trans_roll_inode(&args->trans, args->dp);
+	if (error)
+		return error;
+
+	return __xfs_attr_set_args(args);
 }
 
 /*
@@ -676,18 +655,6 @@ xfs_attr_leaf_addname(
 
 	trace_xfs_attr_leaf_addname(args);
 
-	error = xfs_attr_leaf_try_add(args, bp);
-	if (error)
-		return error;
-
-	/*
-	 * Commit the transaction that added the attr name so that
-	 * later routines can manage their own transactions.
-	 */
-	error = xfs_trans_roll_inode(&args->trans, dp);
-	if (error)
-		return error;
-
 	/*
 	 * If there was an out-of-line value, allocate the blocks we
 	 * identified for its storage and copy the value.  This is done
@@ -923,7 +890,7 @@ xfs_attr_node_addname(
 	 * Fill in bucket of arguments/results/context to carry around.
 	 */
 	dp = args->dp;
-restart:
+
 	/*
 	 * Search to see if name already exists, and get back a pointer
 	 * to where it should go.
@@ -967,21 +934,10 @@ xfs_attr_node_addname(
 			xfs_da_state_free(state);
 			state = NULL;
 			error = xfs_attr3_leaf_to_node(args);
-			if (error)
-				goto out;
-			error = xfs_defer_finish(&args->trans);
 			if (error)
 				goto out;
 
-			/*
-			 * Commit the node conversion and start the next
-			 * trans in the chain.
-			 */
-			error = xfs_trans_roll_inode(&args->trans, dp);
-			if (error)
-				goto out;
-
-			goto restart;
+			return -EAGAIN;
 		}
 
 		/*
@@ -993,9 +949,6 @@ xfs_attr_node_addname(
 		error = xfs_da3_split(state);
 		if (error)
 			goto out;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			goto out;
 	} else {
 		/*
 		 * Addition succeeded, update Btree hashvals.
@@ -1010,13 +963,23 @@ xfs_attr_node_addname(
 	xfs_da_state_free(state);
 	state = NULL;
 
-	/*
-	 * Commit the leaf addition or btree split and start the next
-	 * trans in the chain.
-	 */
-	error = xfs_trans_roll_inode(&args->trans, dp);
+	return 0;
+
+out:
+	if (state)
+		xfs_da_state_free(state);
 	if (error)
-		goto out;
+		return error;
+	return retval;
+}
+
+STATIC int
+xfs_attr_node_addname_work(
+	struct xfs_da_args	*args)
+{
+	struct xfs_da_state	*state;
+	struct xfs_da_state_blk	*blk;
+	int			retval, error;
 
 	/*
 	 * If there was an out-of-line value, allocate the blocks we
Allison Henderson Dec. 24, 2020, 8:23 a.m. UTC | #9
On 12/23/20 7:16 AM, Brian Foster wrote:
> On Tue, Dec 22, 2020 at 10:20:16PM -0700, Allison Henderson wrote:
>>
>>
>> On 12/22/20 11:44 AM, Brian Foster wrote:
>>> On Tue, Dec 22, 2020 at 12:20:20PM -0500, Brian Foster wrote:
>>>> On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
>>>>> On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
>>>>>> This patch modifies the attr remove routines to be delay ready. This
>>>>>> means they no longer roll or commit transactions, but instead return
>>>>>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>>>>>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>>>>>> uses a sort of state machine like switch to keep track of where it was
>>>>>> when EAGAIN was returned. xfs_attr_node_removename has also been
>>>>>> modified to use the switch, and a new version of xfs_attr_remove_args
>>>>>> consists of a simple loop to refresh the transaction until the operation
>>>>>> is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
>>>>>> transaction where ever the existing code used to.
>>>>>>
>>>>>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>>>>>> version __xfs_attr_rmtval_remove. We will rename
>>>>>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>>>>>> done.
>>>>>>
>>>>>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>>>>>> during a rename).  For reasons of preserving existing function, we
>>>>>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>>>>>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>>>>>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>>>>>> used and will be removed.
>>>>>>
>>>>>> This patch also adds a new struct xfs_delattr_context, which we will use
>>>>>> to keep track of the current state of an attribute operation. The new
>>>>>> xfs_delattr_state enum is used to track various operations that are in
>>>>>> progress so that we know not to repeat them, and resume where we left
>>>>>> off before EAGAIN was returned to cycle out the transaction. Other
>>>>>> members take the place of local variables that need to retain their
>>>>>> values across multiple function recalls.  See xfs_attr.h for a more
>>>>>> detailed diagram of the states.
>>>>>>
>>>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>>>> ---
>>>>>
>>>>> I started with a couple small comments on this patch but inevitably
>>>>> started thinking more about the factoring again and ended up with a
>>>>> couple patches on top. The first is more of some small tweaks and
>>>>> open-coding that IMO makes this patch a bit easier to follow. The
>>>>> second is more of an RFC so I'll follow up with that in a second email.
>>>>> I'm curious what folks' thoughts might be on either. Also note that I'm
>>>>> primarily focusing on code structure and whatnot here, so these are fast
>>>>> and loose, compile tested only and likely to be broken.
>>>>>
>>>>
>>>> ... and here's the second diff (applies on top of the first).
>>>>
>>>> This one popped up after staring at the previous changes for a bit and
>>>> wondering whether using "done flags" might make the whole thing easier
>>>> to follow than incremental state transitions. I think the attr remove
>>>> path is easy enough to follow with either method, but the attr set path
>>>> is a beast and so this is more with that in mind. Initial thoughts?
>>>>
>>>
>>> Eh, the more I stare at the attr set code I'm not sure this by itself is
>>> much of an improvement. It helps in some areas, but there are so many
>>> transaction rolls embedded throughout at different levels that a larger
>>> rework of the code is probably still necessary. Anyways, this was just a
>>> random thought for now..
>>>
>>> Brian
>>
>> No worries, I know the feeling :-)  The set works and all, but I do think
>> there is struggle around trying to find a particularly pleasent looking
>> presentation of it.  Especially when we get into the set path, it's a bit
>> more complex.  I may pick through the patches you habe here and pick up the
>> whitespace cleanups and other style adjustments if people prefer it that
>> way.  The good news is, a lot of the *_args routines are supposed to
>> disappear at the end of the set, so there's not really a need to invest too
>> much in them I suppose. It may help to jump to the "Set up infastructure"
>> patch too.  I've expanded the diagram to try and help illustrait the code
>> flow a bit, so that may help with following the code flow.
>>
> 
> I'm sure.. :P Note that the first patch was more smaller tweaks and
> refactoring with the existing model in mind. For the set path, the
> challenge IMO is to make the code generally more readable. I think the
> remove path accomplishes this for the most part because the states and
> whatnot are fairly low overhead on top of the existing complexity. This
> changes considerably for the set path, not so much due to the mechanism
> but because the baseline code is so fragmented and complex from the
> start. I am slightly concerned that bolting state management onto the
> current code as such might make it harder to grok and clean up after the
> fact, but I could be wrong about that (my hope was certainly for the
> opposite).
tbh, everytime I do another spin of the set, I actually make all my 
modifications on top of the extended set, with parent pointers and all, 
and make sure all the test cases are still good.  I know pptrs are still 
pretty far out from here, but they're actually the best testcase for 
this, because it generates so much more activity.  If all thats still 
golden, then I'll pull them back down into the lower subsets and work 
out all the conflicts on the back way up.  If something went wrong, 
diffing the branch heads tracks it down pretty fast.

> 
> Regardless, that had me shifting focus a bit and playing around with the
> current upstream code as opposed to shifting around your code. ISTM that
> there is some commonality across the various set codepaths and perhaps
> there is potential to simplify things notably _before_ applying the
> state management scheme. I've appended a new diff below (based on
> for-next) that starts to demonstrate what I mean. Note again that this
> is similarly fast and loose as I've knowingly threw away some quirks of
> the code (i.e. leaf buffer bhold) for the purpose of quickly trying to
> explore/POC whether the factoring might be sane and plausible.
> 
> In summary, this combines the "try addname" part of each xattr format to
> fall under a single transaction rolling loop such that I think the
> resulting function could become one high level state. I ran out of time
> for working through the rest, but from a read through it seems there's
> at least a chance we could continue with similar refactoring and
> reduction to a fewer number of generic states (vs. more format-specific
> states). For example, the remaining parts of the set operation all seem
> to have something along the lines of the following high level
> components:
> 
> - remote value block allocation (and value set)
> - if rename == true, clear flag and done
> - if rename == false, flip flags
> 	- remove old xattr (i.e., similar to xattr remove)
> 
> ... where much of that code looks remarkably similar across the
> different leaf/node code branches. So I'm curious what you and others
> following along might think about something like this as an intermediate
> step...

Yes, I had noticed similarities when we first started, though I got the 
impression that people mostly wanted to focus on just hoisting the 
transactions upwards.  I did look at them at one point, but seem to 
recall the similarities having just enough disimilarities such that 
trying to consolodate them tends to introduce about as much plumbing 
with if/else's.  In any case, I do think the solution here with the 
format handling is creative, and may reduce a state or two, but I'd 
really need to see it through the test cases to know if it's going to 
work.  From what you've hashed out here, I think I get the idea. It's 
hard for me to comment on readability because I've been up and down the 
code so much.  I do think it's a little loopy looking, but so is the 
statemachine.  Maybe a good spot for others to chime in too.

I actually find it easier to work on it from the top of the set rather 
than the bottom.  Just so that the end goal of what it will end up 
looking like is a little more clear.  Once the goal is clear, then I 
worry about layering it in what ever patch it goes in.  Otherwise it's 
harder to see exactly how the conflicts shake out.

Allison
> 
> Brian
> 
> --- 8< ---
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index fd8e6418a0d3..eff8833d5303 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -58,6 +58,8 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>   				 struct xfs_da_state **state);
>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *, struct xfs_buf *);
> +STATIC int xfs_attr_node_addname_work(struct xfs_da_args *);
>   
>   int
>   xfs_inode_hasattr(
> @@ -216,116 +218,93 @@ xfs_attr_is_shortform(
>   		ip->i_afp->if_nextents == 0);
>   }
>   
> -/*
> - * Attempts to set an attr in shortform, or converts short form to leaf form if
> - * there is not enough room.  If the attr is set, the transaction is committed
> - * and set to NULL.
> - */
> -STATIC int
> -xfs_attr_set_shortform(
> +int
> +xfs_attr_set_fmt(
>   	struct xfs_da_args	*args,
> -	struct xfs_buf		**leaf_bp)
> +	bool			*done)
>   {
>   	struct xfs_inode	*dp = args->dp;
> -	int			error, error2 = 0;
> +	struct xfs_buf		*leaf_bp = NULL;
> +	int			error = 0;
>   
> -	/*
> -	 * Try to add the attr to the attribute list in the inode.
> -	 */
> -	error = xfs_attr_try_sf_addname(dp, args);
> -	if (error != -ENOSPC) {
> -		error2 = xfs_trans_commit(args->trans);
> -		args->trans = NULL;
> -		return error ? error : error2;
> +	if (xfs_attr_is_shortform(dp)) {
> +		error = xfs_attr_try_sf_addname(dp, args);
> +		if (!error)
> +			*done = true;
> +		if (error != -ENOSPC)
> +			return error;
> +
> +		error = xfs_attr_shortform_to_leaf(args, &leaf_bp);
> +		if (error)
> +			return error;
> +		return -EAGAIN;
>   	}
> -	/*
> -	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> -	 * another possible req'mt for a double-split btree op.
> -	 */
> -	error = xfs_attr_shortform_to_leaf(args, leaf_bp);
> -	if (error)
> -		return error;
>   
> -	/*
> -	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
> -	 * push cannot grab the half-baked leaf buffer and run into problems
> -	 * with the write verifier. Once we're done rolling the transaction we
> -	 * can release the hold and add the attr to the leaf.
> -	 */
> -	xfs_trans_bhold(args->trans, *leaf_bp);
> -	error = xfs_defer_finish(&args->trans);
> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> -	if (error) {
> -		xfs_trans_brelse(args->trans, *leaf_bp);
> -		return error;
> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> +		struct xfs_buf	*bp = NULL;
> +
> +		error = xfs_attr_leaf_try_add(args, bp);
> +		if (error != -ENOSPC)
> +			return error;
> +
> +		error = xfs_attr3_leaf_to_node(args);
> +		if (error)
> +			return error;
> +		return -EAGAIN;
>   	}
>   
> -	return 0;
> +	return xfs_attr_node_addname(args);
>   }
>   
>   /*
>    * Set the attribute specified in @args.
>    */
>   int
> -xfs_attr_set_args(
> +__xfs_attr_set_args(
>   	struct xfs_da_args	*args)
>   {
>   	struct xfs_inode	*dp = args->dp;
> -	struct xfs_buf          *leaf_bp = NULL;
>   	int			error = 0;
>   
> -	/*
> -	 * If the attribute list is already in leaf format, jump straight to
> -	 * leaf handling.  Otherwise, try to add the attribute to the shortform
> -	 * list; if there's no room then convert the list to leaf format and try
> -	 * again.
> -	 */
> -	if (xfs_attr_is_shortform(dp)) {
> -
> -		/*
> -		 * If the attr was successfully set in shortform, the
> -		 * transaction is committed and set to NULL.  Otherwise, is it
> -		 * converted from shortform to leaf, and the transaction is
> -		 * retained.
> -		 */
> -		error = xfs_attr_set_shortform(args, &leaf_bp);
> -		if (error || !args->trans)
> -			return error;
> -	}
> -
>   	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>   		error = xfs_attr_leaf_addname(args);
> -		if (error != -ENOSPC)
> -			return error;
> -
> -		/*
> -		 * Promote the attribute list to the Btree format.
> -		 */
> -		error = xfs_attr3_leaf_to_node(args);
>   		if (error)
>   			return error;
> +	}
> +
> +	error = xfs_attr_node_addname_work(args);
> +	return error;
> +}
> +
> +int
> +xfs_attr_set_args(
> +	struct xfs_da_args	*args)
> +
> +{
> +	int			error;
> +	bool			done = false;
> +
> +	do {
> +		error = xfs_attr_set_fmt(args, &done);
> +		if (error != -EAGAIN)
> +			break;
>   
> -		/*
> -		 * Finish any deferred work items and roll the transaction once
> -		 * more.  The goal here is to call node_addname with the inode
> -		 * and transaction in the same state (inode locked and joined,
> -		 * transaction clean) no matter how we got to this step.
> -		 */
>   		error = xfs_defer_finish(&args->trans);
>   		if (error)
> -			return error;
> +			break;
> +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +	} while (!error);
>   
> -		/*
> -		 * Commit the current trans (including the inode) and
> -		 * start a new one.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			return error;
> -	}
> +	if (error || done)
> +		return error;
>   
> -	error = xfs_attr_node_addname(args);
> -	return error;
> +	error = xfs_defer_finish(&args->trans);
> +	if (!error)
> +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +	if (error)
> +		return error;
> +
> +	return __xfs_attr_set_args(args);
>   }
>   
>   /*
> @@ -676,18 +655,6 @@ xfs_attr_leaf_addname(
>   
>   	trace_xfs_attr_leaf_addname(args);
>   
> -	error = xfs_attr_leaf_try_add(args, bp);
> -	if (error)
> -		return error;
> -
> -	/*
> -	 * Commit the transaction that added the attr name so that
> -	 * later routines can manage their own transactions.
> -	 */
> -	error = xfs_trans_roll_inode(&args->trans, dp);
> -	if (error)
> -		return error;
> -
>   	/*
>   	 * If there was an out-of-line value, allocate the blocks we
>   	 * identified for its storage and copy the value.  This is done
> @@ -923,7 +890,7 @@ xfs_attr_node_addname(
>   	 * Fill in bucket of arguments/results/context to carry around.
>   	 */
>   	dp = args->dp;
> -restart:
> +
>   	/*
>   	 * Search to see if name already exists, and get back a pointer
>   	 * to where it should go.
> @@ -967,21 +934,10 @@ xfs_attr_node_addname(
>   			xfs_da_state_free(state);
>   			state = NULL;
>   			error = xfs_attr3_leaf_to_node(args);
> -			if (error)
> -				goto out;
> -			error = xfs_defer_finish(&args->trans);
>   			if (error)
>   				goto out;
>   
> -			/*
> -			 * Commit the node conversion and start the next
> -			 * trans in the chain.
> -			 */
> -			error = xfs_trans_roll_inode(&args->trans, dp);
> -			if (error)
> -				goto out;
> -
> -			goto restart;
> +			return -EAGAIN;
>   		}
>   
>   		/*
> @@ -993,9 +949,6 @@ xfs_attr_node_addname(
>   		error = xfs_da3_split(state);
>   		if (error)
>   			goto out;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			goto out;
>   	} else {
>   		/*
>   		 * Addition succeeded, update Btree hashvals.
> @@ -1010,13 +963,23 @@ xfs_attr_node_addname(
>   	xfs_da_state_free(state);
>   	state = NULL;
>   
> -	/*
> -	 * Commit the leaf addition or btree split and start the next
> -	 * trans in the chain.
> -	 */
> -	error = xfs_trans_roll_inode(&args->trans, dp);
> +	return 0;
> +
> +out:
> +	if (state)
> +		xfs_da_state_free(state);
>   	if (error)
> -		goto out;
> +		return error;
> +	return retval;
> +}
> +
> +STATIC int
> +xfs_attr_node_addname_work(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_da_state	*state;
> +	struct xfs_da_state_blk	*blk;
> +	int			retval, error;
>   
>   	/*
>   	 * If there was an out-of-line value, allocate the blocks we
>
Brian Foster Jan. 4, 2021, 5:52 p.m. UTC | #10
On Thu, Dec 24, 2020 at 01:23:24AM -0700, Allison Henderson wrote:
> 
> 
> On 12/23/20 7:16 AM, Brian Foster wrote:
> > On Tue, Dec 22, 2020 at 10:20:16PM -0700, Allison Henderson wrote:
> > > 
> > > 
> > > On 12/22/20 11:44 AM, Brian Foster wrote:
> > > > On Tue, Dec 22, 2020 at 12:20:20PM -0500, Brian Foster wrote:
> > > > > On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
> > > > > > On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
> > > > > > > This patch modifies the attr remove routines to be delay ready. This
> > > > > > > means they no longer roll or commit transactions, but instead return
> > > > > > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > > > > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > > > > > uses a sort of state machine like switch to keep track of where it was
> > > > > > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > > > > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > > > > > consists of a simple loop to refresh the transaction until the operation
> > > > > > > is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > > > > > transaction where ever the existing code used to.
> > > > > > > 
> > > > > > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > > > > > version __xfs_attr_rmtval_remove. We will rename
> > > > > > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > > > > > done.
> > > > > > > 
> > > > > > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > > > > > during a rename).  For reasons of preserving existing function, we
> > > > > > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > > > > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > > > > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > > > > > used and will be removed.
> > > > > > > 
> > > > > > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > > > > > to keep track of the current state of an attribute operation. The new
> > > > > > > xfs_delattr_state enum is used to track various operations that are in
> > > > > > > progress so that we know not to repeat them, and resume where we left
> > > > > > > off before EAGAIN was returned to cycle out the transaction. Other
> > > > > > > members take the place of local variables that need to retain their
> > > > > > > values across multiple function recalls.  See xfs_attr.h for a more
> > > > > > > detailed diagram of the states.
> > > > > > > 
> > > > > > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > > > > > ---
> > > > > > 
> > > > > > I started with a couple small comments on this patch but inevitably
> > > > > > started thinking more about the factoring again and ended up with a
> > > > > > couple patches on top. The first is more of some small tweaks and
> > > > > > open-coding that IMO makes this patch a bit easier to follow. The
> > > > > > second is more of an RFC so I'll follow up with that in a second email.
> > > > > > I'm curious what folks' thoughts might be on either. Also note that I'm
> > > > > > primarily focusing on code structure and whatnot here, so these are fast
> > > > > > and loose, compile tested only and likely to be broken.
> > > > > > 
> > > > > 
> > > > > ... and here's the second diff (applies on top of the first).
> > > > > 
> > > > > This one popped up after staring at the previous changes for a bit and
> > > > > wondering whether using "done flags" might make the whole thing easier
> > > > > to follow than incremental state transitions. I think the attr remove
> > > > > path is easy enough to follow with either method, but the attr set path
> > > > > is a beast and so this is more with that in mind. Initial thoughts?
> > > > > 
> > > > 
> > > > Eh, the more I stare at the attr set code I'm not sure this by itself is
> > > > much of an improvement. It helps in some areas, but there are so many
> > > > transaction rolls embedded throughout at different levels that a larger
> > > > rework of the code is probably still necessary. Anyways, this was just a
> > > > random thought for now..
> > > > 
> > > > Brian
> > > 
> > > No worries, I know the feeling :-)  The set works and all, but I do think
> > > there is struggle around trying to find a particularly pleasent looking
> > > presentation of it.  Especially when we get into the set path, it's a bit
> > > more complex.  I may pick through the patches you habe here and pick up the
> > > whitespace cleanups and other style adjustments if people prefer it that
> > > way.  The good news is, a lot of the *_args routines are supposed to
> > > disappear at the end of the set, so there's not really a need to invest too
> > > much in them I suppose. It may help to jump to the "Set up infastructure"
> > > patch too.  I've expanded the diagram to try and help illustrait the code
> > > flow a bit, so that may help with following the code flow.
> > > 
> > 
> > I'm sure.. :P Note that the first patch was more smaller tweaks and
> > refactoring with the existing model in mind. For the set path, the
> > challenge IMO is to make the code generally more readable. I think the
> > remove path accomplishes this for the most part because the states and
> > whatnot are fairly low overhead on top of the existing complexity. This
> > changes considerably for the set path, not so much due to the mechanism
> > but because the baseline code is so fragmented and complex from the
> > start. I am slightly concerned that bolting state management onto the
> > current code as such might make it harder to grok and clean up after the
> > fact, but I could be wrong about that (my hope was certainly for the
> > opposite).
> tbh, everytime I do another spin of the set, I actually make all my
> modifications on top of the extended set, with parent pointers and all, and
> make sure all the test cases are still good.  I know pptrs are still pretty
> far out from here, but they're actually the best testcase for this, because
> it generates so much more activity.  If all thats still golden, then I'll
> pull them back down into the lower subsets and work out all the conflicts on
> the back way up.  If something went wrong, diffing the branch heads tracks
> it down pretty fast.
> 

Indeed, that's a good thing. My comment was more around the readability
of the code and subsequent ability to clean it up, reduce the number of
required states, etc...

> > 
> > Regardless, that had me shifting focus a bit and playing around with the
> > current upstream code as opposed to shifting around your code. ISTM that
> > there is some commonality across the various set codepaths and perhaps
> > there is potential to simplify things notably _before_ applying the
> > state management scheme. I've appended a new diff below (based on
> > for-next) that starts to demonstrate what I mean. Note again that this
> > is similarly fast and loose as I've knowingly threw away some quirks of
> > the code (i.e. leaf buffer bhold) for the purpose of quickly trying to
> > explore/POC whether the factoring might be sane and plausible.
> > 
> > In summary, this combines the "try addname" part of each xattr format to
> > fall under a single transaction rolling loop such that I think the
> > resulting function could become one high level state. I ran out of time
> > for working through the rest, but from a read through it seems there's
> > at least a chance we could continue with similar refactoring and
> > reduction to a fewer number of generic states (vs. more format-specific
> > states). For example, the remaining parts of the set operation all seem
> > to have something along the lines of the following high level
> > components:
> > 
> > - remote value block allocation (and value set)
> > - if rename == true, clear flag and done
> > - if rename == false, flip flags
> > 	- remove old xattr (i.e., similar to xattr remove)
> > 
> > ... where much of that code looks remarkably similar across the
> > different leaf/node code branches. So I'm curious what you and others
> > following along might think about something like this as an intermediate
> > step...
> 
> Yes, I had noticed similarities when we first started, though I got the
> impression that people mostly wanted to focus on just hoisting the
> transactions upwards.  I did look at them at one point, but seem to recall
> the similarities having just enough disimilarities such that trying to
> consolodate them tends to introduce about as much plumbing with if/else's.
> In any case, I do think the solution here with the format handling is
> creative, and may reduce a state or two, but I'd really need to see it
> through the test cases to know if it's going to work.  From what you've
> hashed out here, I think I get the idea. It's hard for me to comment on
> readability because I've been up and down the code so much.  I do think it's
> a little loopy looking, but so is the statemachine.  Maybe a good spot for
> others to chime in too.
> 

Can you elaborate on what you mean by loopy? :P I'm sure you noticed I
borrowed the transaction rolling mechanism from your infra patch..

But yeah, I'm partly to blame for the hoisting approach as well. I was
thinking/hoping that seeing the various states would facilitate
simplification of the code, but my first reaction when looking at the
(much more complex) xattr set path is more confusion than clarity. I see
the code drop into state management, using that to call into
format-specific helpers, then fall into doing some other stuff that
might call into some of the same format-specific add helpers, then
realize I'll probably have to trace up and down through the whole path
to make some sense of the execution flow. That is what has me wondering
whether this would become more simple with fewer, generic and higher
level states like SET_FORMAT (i.e. what I hacked up), SET_NAME,
SET_VALUE (rmt block allocs), SET_FLAG (clear or flip), and then finally
fall into the remove path in the rename case.

We'd ultimately implement the same type of state machine approach, it
would just require more up front cleanup rework than the other way
around, and hopefully land fairly simplified from the onset. Of course
those states are just off the top of my head so might not be feasible,
but I'm also curious if any others following along might have thoughts
one way or the other. I'm sure we could implement things in either order
when it comes down to it...

Brian

> I actually find it easier to work on it from the top of the set rather than
> the bottom.  Just so that the end goal of what it will end up looking like
> is a little more clear.  Once the goal is clear, then I worry about layering
> it in what ever patch it goes in.  Otherwise it's harder to see exactly how
> the conflicts shake out.
> 
> Allison
> > 
> > Brian
> > 
> > --- 8< ---
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index fd8e6418a0d3..eff8833d5303 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -58,6 +58,8 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> >   				 struct xfs_da_state **state);
> >   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> >   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> > +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *, struct xfs_buf *);
> > +STATIC int xfs_attr_node_addname_work(struct xfs_da_args *);
> >   int
> >   xfs_inode_hasattr(
> > @@ -216,116 +218,93 @@ xfs_attr_is_shortform(
> >   		ip->i_afp->if_nextents == 0);
> >   }
> > -/*
> > - * Attempts to set an attr in shortform, or converts short form to leaf form if
> > - * there is not enough room.  If the attr is set, the transaction is committed
> > - * and set to NULL.
> > - */
> > -STATIC int
> > -xfs_attr_set_shortform(
> > +int
> > +xfs_attr_set_fmt(
> >   	struct xfs_da_args	*args,
> > -	struct xfs_buf		**leaf_bp)
> > +	bool			*done)
> >   {
> >   	struct xfs_inode	*dp = args->dp;
> > -	int			error, error2 = 0;
> > +	struct xfs_buf		*leaf_bp = NULL;
> > +	int			error = 0;
> > -	/*
> > -	 * Try to add the attr to the attribute list in the inode.
> > -	 */
> > -	error = xfs_attr_try_sf_addname(dp, args);
> > -	if (error != -ENOSPC) {
> > -		error2 = xfs_trans_commit(args->trans);
> > -		args->trans = NULL;
> > -		return error ? error : error2;
> > +	if (xfs_attr_is_shortform(dp)) {
> > +		error = xfs_attr_try_sf_addname(dp, args);
> > +		if (!error)
> > +			*done = true;
> > +		if (error != -ENOSPC)
> > +			return error;
> > +
> > +		error = xfs_attr_shortform_to_leaf(args, &leaf_bp);
> > +		if (error)
> > +			return error;
> > +		return -EAGAIN;
> >   	}
> > -	/*
> > -	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> > -	 * another possible req'mt for a double-split btree op.
> > -	 */
> > -	error = xfs_attr_shortform_to_leaf(args, leaf_bp);
> > -	if (error)
> > -		return error;
> > -	/*
> > -	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
> > -	 * push cannot grab the half-baked leaf buffer and run into problems
> > -	 * with the write verifier. Once we're done rolling the transaction we
> > -	 * can release the hold and add the attr to the leaf.
> > -	 */
> > -	xfs_trans_bhold(args->trans, *leaf_bp);
> > -	error = xfs_defer_finish(&args->trans);
> > -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> > -	if (error) {
> > -		xfs_trans_brelse(args->trans, *leaf_bp);
> > -		return error;
> > +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > +		struct xfs_buf	*bp = NULL;
> > +
> > +		error = xfs_attr_leaf_try_add(args, bp);
> > +		if (error != -ENOSPC)
> > +			return error;
> > +
> > +		error = xfs_attr3_leaf_to_node(args);
> > +		if (error)
> > +			return error;
> > +		return -EAGAIN;
> >   	}
> > -	return 0;
> > +	return xfs_attr_node_addname(args);
> >   }
> >   /*
> >    * Set the attribute specified in @args.
> >    */
> >   int
> > -xfs_attr_set_args(
> > +__xfs_attr_set_args(
> >   	struct xfs_da_args	*args)
> >   {
> >   	struct xfs_inode	*dp = args->dp;
> > -	struct xfs_buf          *leaf_bp = NULL;
> >   	int			error = 0;
> > -	/*
> > -	 * If the attribute list is already in leaf format, jump straight to
> > -	 * leaf handling.  Otherwise, try to add the attribute to the shortform
> > -	 * list; if there's no room then convert the list to leaf format and try
> > -	 * again.
> > -	 */
> > -	if (xfs_attr_is_shortform(dp)) {
> > -
> > -		/*
> > -		 * If the attr was successfully set in shortform, the
> > -		 * transaction is committed and set to NULL.  Otherwise, is it
> > -		 * converted from shortform to leaf, and the transaction is
> > -		 * retained.
> > -		 */
> > -		error = xfs_attr_set_shortform(args, &leaf_bp);
> > -		if (error || !args->trans)
> > -			return error;
> > -	}
> > -
> >   	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> >   		error = xfs_attr_leaf_addname(args);
> > -		if (error != -ENOSPC)
> > -			return error;
> > -
> > -		/*
> > -		 * Promote the attribute list to the Btree format.
> > -		 */
> > -		error = xfs_attr3_leaf_to_node(args);
> >   		if (error)
> >   			return error;
> > +	}
> > +
> > +	error = xfs_attr_node_addname_work(args);
> > +	return error;
> > +}
> > +
> > +int
> > +xfs_attr_set_args(
> > +	struct xfs_da_args	*args)
> > +
> > +{
> > +	int			error;
> > +	bool			done = false;
> > +
> > +	do {
> > +		error = xfs_attr_set_fmt(args, &done);
> > +		if (error != -EAGAIN)
> > +			break;
> > -		/*
> > -		 * Finish any deferred work items and roll the transaction once
> > -		 * more.  The goal here is to call node_addname with the inode
> > -		 * and transaction in the same state (inode locked and joined,
> > -		 * transaction clean) no matter how we got to this step.
> > -		 */
> >   		error = xfs_defer_finish(&args->trans);
> >   		if (error)
> > -			return error;
> > +			break;
> > +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > +	} while (!error);
> > -		/*
> > -		 * Commit the current trans (including the inode) and
> > -		 * start a new one.
> > -		 */
> > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > -		if (error)
> > -			return error;
> > -	}
> > +	if (error || done)
> > +		return error;
> > -	error = xfs_attr_node_addname(args);
> > -	return error;
> > +	error = xfs_defer_finish(&args->trans);
> > +	if (!error)
> > +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > +	if (error)
> > +		return error;
> > +
> > +	return __xfs_attr_set_args(args);
> >   }
> >   /*
> > @@ -676,18 +655,6 @@ xfs_attr_leaf_addname(
> >   	trace_xfs_attr_leaf_addname(args);
> > -	error = xfs_attr_leaf_try_add(args, bp);
> > -	if (error)
> > -		return error;
> > -
> > -	/*
> > -	 * Commit the transaction that added the attr name so that
> > -	 * later routines can manage their own transactions.
> > -	 */
> > -	error = xfs_trans_roll_inode(&args->trans, dp);
> > -	if (error)
> > -		return error;
> > -
> >   	/*
> >   	 * If there was an out-of-line value, allocate the blocks we
> >   	 * identified for its storage and copy the value.  This is done
> > @@ -923,7 +890,7 @@ xfs_attr_node_addname(
> >   	 * Fill in bucket of arguments/results/context to carry around.
> >   	 */
> >   	dp = args->dp;
> > -restart:
> > +
> >   	/*
> >   	 * Search to see if name already exists, and get back a pointer
> >   	 * to where it should go.
> > @@ -967,21 +934,10 @@ xfs_attr_node_addname(
> >   			xfs_da_state_free(state);
> >   			state = NULL;
> >   			error = xfs_attr3_leaf_to_node(args);
> > -			if (error)
> > -				goto out;
> > -			error = xfs_defer_finish(&args->trans);
> >   			if (error)
> >   				goto out;
> > -			/*
> > -			 * Commit the node conversion and start the next
> > -			 * trans in the chain.
> > -			 */
> > -			error = xfs_trans_roll_inode(&args->trans, dp);
> > -			if (error)
> > -				goto out;
> > -
> > -			goto restart;
> > +			return -EAGAIN;
> >   		}
> >   		/*
> > @@ -993,9 +949,6 @@ xfs_attr_node_addname(
> >   		error = xfs_da3_split(state);
> >   		if (error)
> >   			goto out;
> > -		error = xfs_defer_finish(&args->trans);
> > -		if (error)
> > -			goto out;
> >   	} else {
> >   		/*
> >   		 * Addition succeeded, update Btree hashvals.
> > @@ -1010,13 +963,23 @@ xfs_attr_node_addname(
> >   	xfs_da_state_free(state);
> >   	state = NULL;
> > -	/*
> > -	 * Commit the leaf addition or btree split and start the next
> > -	 * trans in the chain.
> > -	 */
> > -	error = xfs_trans_roll_inode(&args->trans, dp);
> > +	return 0;
> > +
> > +out:
> > +	if (state)
> > +		xfs_da_state_free(state);
> >   	if (error)
> > -		goto out;
> > +		return error;
> > +	return retval;
> > +}
> > +
> > +STATIC int
> > +xfs_attr_node_addname_work(
> > +	struct xfs_da_args	*args)
> > +{
> > +	struct xfs_da_state	*state;
> > +	struct xfs_da_state_blk	*blk;
> > +	int			retval, error;
> >   	/*
> >   	 * If there was an out-of-line value, allocate the blocks we
> > 
>
Allison Henderson Jan. 5, 2021, 6:10 p.m. UTC | #11
On 1/4/21 10:52 AM, Brian Foster wrote:
> On Thu, Dec 24, 2020 at 01:23:24AM -0700, Allison Henderson wrote:
>>
>>
>> On 12/23/20 7:16 AM, Brian Foster wrote:
>>> On Tue, Dec 22, 2020 at 10:20:16PM -0700, Allison Henderson wrote:
>>>>
>>>>
>>>> On 12/22/20 11:44 AM, Brian Foster wrote:
>>>>> On Tue, Dec 22, 2020 at 12:20:20PM -0500, Brian Foster wrote:
>>>>>> On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
>>>>>>> On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
>>>>>>>> This patch modifies the attr remove routines to be delay ready. This
>>>>>>>> means they no longer roll or commit transactions, but instead return
>>>>>>>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>>>>>>>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>>>>>>>> uses a sort of state machine like switch to keep track of where it was
>>>>>>>> when EAGAIN was returned. xfs_attr_node_removename has also been
>>>>>>>> modified to use the switch, and a new version of xfs_attr_remove_args
>>>>>>>> consists of a simple loop to refresh the transaction until the operation
>>>>>>>> is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
>>>>>>>> transaction where ever the existing code used to.
>>>>>>>>
>>>>>>>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>>>>>>>> version __xfs_attr_rmtval_remove. We will rename
>>>>>>>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>>>>>>>> done.
>>>>>>>>
>>>>>>>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>>>>>>>> during a rename).  For reasons of preserving existing function, we
>>>>>>>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>>>>>>>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>>>>>>>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>>>>>>>> used and will be removed.
>>>>>>>>
>>>>>>>> This patch also adds a new struct xfs_delattr_context, which we will use
>>>>>>>> to keep track of the current state of an attribute operation. The new
>>>>>>>> xfs_delattr_state enum is used to track various operations that are in
>>>>>>>> progress so that we know not to repeat them, and resume where we left
>>>>>>>> off before EAGAIN was returned to cycle out the transaction. Other
>>>>>>>> members take the place of local variables that need to retain their
>>>>>>>> values across multiple function recalls.  See xfs_attr.h for a more
>>>>>>>> detailed diagram of the states.
>>>>>>>>
>>>>>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>>>>>> ---
>>>>>>>
>>>>>>> I started with a couple small comments on this patch but inevitably
>>>>>>> started thinking more about the factoring again and ended up with a
>>>>>>> couple patches on top. The first is more of some small tweaks and
>>>>>>> open-coding that IMO makes this patch a bit easier to follow. The
>>>>>>> second is more of an RFC so I'll follow up with that in a second email.
>>>>>>> I'm curious what folks' thoughts might be on either. Also note that I'm
>>>>>>> primarily focusing on code structure and whatnot here, so these are fast
>>>>>>> and loose, compile tested only and likely to be broken.
>>>>>>>
>>>>>>
>>>>>> ... and here's the second diff (applies on top of the first).
>>>>>>
>>>>>> This one popped up after staring at the previous changes for a bit and
>>>>>> wondering whether using "done flags" might make the whole thing easier
>>>>>> to follow than incremental state transitions. I think the attr remove
>>>>>> path is easy enough to follow with either method, but the attr set path
>>>>>> is a beast and so this is more with that in mind. Initial thoughts?
>>>>>>
>>>>>
>>>>> Eh, the more I stare at the attr set code I'm not sure this by itself is
>>>>> much of an improvement. It helps in some areas, but there are so many
>>>>> transaction rolls embedded throughout at different levels that a larger
>>>>> rework of the code is probably still necessary. Anyways, this was just a
>>>>> random thought for now..
>>>>>
>>>>> Brian
>>>>
>>>> No worries, I know the feeling :-)  The set works and all, but I do think
>>>> there is struggle around trying to find a particularly pleasent looking
>>>> presentation of it.  Especially when we get into the set path, it's a bit
>>>> more complex.  I may pick through the patches you habe here and pick up the
>>>> whitespace cleanups and other style adjustments if people prefer it that
>>>> way.  The good news is, a lot of the *_args routines are supposed to
>>>> disappear at the end of the set, so there's not really a need to invest too
>>>> much in them I suppose. It may help to jump to the "Set up infastructure"
>>>> patch too.  I've expanded the diagram to try and help illustrait the code
>>>> flow a bit, so that may help with following the code flow.
>>>>
>>>
>>> I'm sure.. :P Note that the first patch was more smaller tweaks and
>>> refactoring with the existing model in mind. For the set path, the
>>> challenge IMO is to make the code generally more readable. I think the
>>> remove path accomplishes this for the most part because the states and
>>> whatnot are fairly low overhead on top of the existing complexity. This
>>> changes considerably for the set path, not so much due to the mechanism
>>> but because the baseline code is so fragmented and complex from the
>>> start. I am slightly concerned that bolting state management onto the
>>> current code as such might make it harder to grok and clean up after the
>>> fact, but I could be wrong about that (my hope was certainly for the
>>> opposite).
>> tbh, everytime I do another spin of the set, I actually make all my
>> modifications on top of the extended set, with parent pointers and all, and
>> make sure all the test cases are still good.  I know pptrs are still pretty
>> far out from here, but they're actually the best testcase for this, because
>> it generates so much more activity.  If all thats still golden, then I'll
>> pull them back down into the lower subsets and work out all the conflicts on
>> the back way up.  If something went wrong, diffing the branch heads tracks
>> it down pretty fast.
>>
> 
> Indeed, that's a good thing. My comment was more around the readability
> of the code and subsequent ability to clean it up, reduce the number of
> required states, etc...
> 
>>>
>>> Regardless, that had me shifting focus a bit and playing around with the
>>> current upstream code as opposed to shifting around your code. ISTM that
>>> there is some commonality across the various set codepaths and perhaps
>>> there is potential to simplify things notably _before_ applying the
>>> state management scheme. I've appended a new diff below (based on
>>> for-next) that starts to demonstrate what I mean. Note again that this
>>> is similarly fast and loose as I've knowingly threw away some quirks of
>>> the code (i.e. leaf buffer bhold) for the purpose of quickly trying to
>>> explore/POC whether the factoring might be sane and plausible.
>>>
>>> In summary, this combines the "try addname" part of each xattr format to
>>> fall under a single transaction rolling loop such that I think the
>>> resulting function could become one high level state. I ran out of time
>>> for working through the rest, but from a read through it seems there's
>>> at least a chance we could continue with similar refactoring and
>>> reduction to a fewer number of generic states (vs. more format-specific
>>> states). For example, the remaining parts of the set operation all seem
>>> to have something along the lines of the following high level
>>> components:
>>>
>>> - remote value block allocation (and value set)
>>> - if rename == true, clear flag and done
>>> - if rename == false, flip flags
>>> 	- remove old xattr (i.e., similar to xattr remove)
>>>
>>> ... where much of that code looks remarkably similar across the
>>> different leaf/node code branches. So I'm curious what you and others
>>> following along might think about something like this as an intermediate
>>> step...
>>
>> Yes, I had noticed similarities when we first started, though I got the
>> impression that people mostly wanted to focus on just hoisting the
>> transactions upwards.  I did look at them at one point, but seem to recall
>> the similarities having just enough disimilarities such that trying to
>> consolodate them tends to introduce about as much plumbing with if/else's.
>> In any case, I do think the solution here with the format handling is
>> creative, and may reduce a state or two, but I'd really need to see it
>> through the test cases to know if it's going to work.  From what you've
>> hashed out here, I think I get the idea. It's hard for me to comment on
>> readability because I've been up and down the code so much.  I do think it's
>> a little loopy looking, but so is the statemachine.  Maybe a good spot for
>> others to chime in too.
>>
> 
> Can you elaborate on what you mean by loopy? :P I'm sure you noticed I
> borrowed the transaction rolling mechanism from your infra patch..
> 
Well, that loop that is borrowed is meant to disappear at the end of the 
set though.  This part with *_set_fmt we would have to keep.  I guess 
that really means the *_set_fmt call would probably get consolodated 
into the *_iter routine though.  Let me see if I can get something like 
this to work on top of the set so it's a bit more clear what it would 
look like.  I think this modification would actually look simpler if it 
came in after the statemachine.  Otherwise you're trying to introduce 
the tranaction loop early.  Really it's purpose is just to get the state 
machine working, and then we get rid of it later.

> But yeah, I'm partly to blame for the hoisting approach as well. I was
> thinking/hoping that seeing the various states would facilitate
> simplification of the code, but my first reaction when looking at the
> (much more complex) xattr set path is more confusion than clarity. I see
> the code drop into state management, using that to call into
> format-specific helpers, then fall into doing some other stuff that
> might call into some of the same format-specific add helpers, then
> realize I'll probably have to trace up and down through the whole path
> to make some sense of the execution flow. 

Yeah, I think this question is very prefrence oriented.  See, initially, 
I thought the pattern of pairing states to gotos sort of alleviated the 
anxiety of needing to trace up and down the code:


    /*
     * We're going away for a bit to cycle the tranaction,
     * but we're gonna come back ....
     */
    dela_state = XFS_DAS_UNIQUE_STATE;
    return -EAGAIN;

xfs_das_unique_state:
    /* ...and resume execution here */


Granted, sometimes we can use the state of the attr to get away from 
needing this, but now you have to re-read the code in the context of 
what ever form we're in to figure that we land back in the same place. I 
realize this is sort of a unique pattern, so I understand people wanting 
to explore the idea of simplifying it away.  At this point I feel like I 
can follow it either way, so it's really what folks are more comfortable 
with.

That is what has me wondering
> whether this would become more simple with fewer, generic and higher
> level states like SET_FORMAT (i.e. what I hacked up), SET_NAME,
> SET_VALUE (rmt block allocs), SET_FLAG (clear or flip), and then finally
> fall into the remove path in the rename case.
> 
> We'd ultimately implement the same type of state machine approach, it
> would just require more up front cleanup rework than the other way
> around, and hopefully land fairly simplified from the onset. Of course
> those states are just off the top of my head so might not be feasible,
> but I'm also curious if any others following along might have thoughts
> one way or the other. I'm sure we could implement things in either order
> when it comes down to it...
Yeah, let me see if it's feasable, and what it ends up looking like. 
I'm kindof of the opinion that if you to have have a certain degree of 
complexity (ie setting states, and resumeing with gotos), you may as 
well leverage it what it can do.  Once you abosorb that pattern, it's 
not so scary the next time you see it.  Simplfying is certainly a good 
thing, but if it breaks the pattern thats keeps a more complex concept 
organized, the simplification might not make as much sense to others.  I 
think it's likley a spot for others to chime in, I think after looking 
at the same code for a while, it's hard to put yourself in the POV of 
someone else still trying to work through it.  :-)

Allison

> 
> Brian
> 
>> I actually find it easier to work on it from the top of the set rather than
>> the bottom.  Just so that the end goal of what it will end up looking like
>> is a little more clear.  Once the goal is clear, then I worry about layering
>> it in what ever patch it goes in.  Otherwise it's harder to see exactly how
>> the conflicts shake out.
>>
>> Allison
>>>
>>> Brian
>>>
>>> --- 8< ---
>>>
>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>> index fd8e6418a0d3..eff8833d5303 100644
>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>> @@ -58,6 +58,8 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>    				 struct xfs_da_state **state);
>>>    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>    STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *, struct xfs_buf *);
>>> +STATIC int xfs_attr_node_addname_work(struct xfs_da_args *);
>>>    int
>>>    xfs_inode_hasattr(
>>> @@ -216,116 +218,93 @@ xfs_attr_is_shortform(
>>>    		ip->i_afp->if_nextents == 0);
>>>    }
>>> -/*
>>> - * Attempts to set an attr in shortform, or converts short form to leaf form if
>>> - * there is not enough room.  If the attr is set, the transaction is committed
>>> - * and set to NULL.
>>> - */
>>> -STATIC int
>>> -xfs_attr_set_shortform(
>>> +int
>>> +xfs_attr_set_fmt(
>>>    	struct xfs_da_args	*args,
>>> -	struct xfs_buf		**leaf_bp)
>>> +	bool			*done)
>>>    {
>>>    	struct xfs_inode	*dp = args->dp;
>>> -	int			error, error2 = 0;
>>> +	struct xfs_buf		*leaf_bp = NULL;
>>> +	int			error = 0;
>>> -	/*
>>> -	 * Try to add the attr to the attribute list in the inode.
>>> -	 */
>>> -	error = xfs_attr_try_sf_addname(dp, args);
>>> -	if (error != -ENOSPC) {
>>> -		error2 = xfs_trans_commit(args->trans);
>>> -		args->trans = NULL;
>>> -		return error ? error : error2;
>>> +	if (xfs_attr_is_shortform(dp)) {
>>> +		error = xfs_attr_try_sf_addname(dp, args);
>>> +		if (!error)
>>> +			*done = true;
>>> +		if (error != -ENOSPC)
>>> +			return error;
>>> +
>>> +		error = xfs_attr_shortform_to_leaf(args, &leaf_bp);
>>> +		if (error)
>>> +			return error;
>>> +		return -EAGAIN;
>>>    	}
>>> -	/*
>>> -	 * It won't fit in the shortform, transform to a leaf block.  GROT:
>>> -	 * another possible req'mt for a double-split btree op.
>>> -	 */
>>> -	error = xfs_attr_shortform_to_leaf(args, leaf_bp);
>>> -	if (error)
>>> -		return error;
>>> -	/*
>>> -	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
>>> -	 * push cannot grab the half-baked leaf buffer and run into problems
>>> -	 * with the write verifier. Once we're done rolling the transaction we
>>> -	 * can release the hold and add the attr to the leaf.
>>> -	 */
>>> -	xfs_trans_bhold(args->trans, *leaf_bp);
>>> -	error = xfs_defer_finish(&args->trans);
>>> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
>>> -	if (error) {
>>> -		xfs_trans_brelse(args->trans, *leaf_bp);
>>> -		return error;
>>> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>> +		struct xfs_buf	*bp = NULL;
>>> +
>>> +		error = xfs_attr_leaf_try_add(args, bp);
>>> +		if (error != -ENOSPC)
>>> +			return error;
>>> +
>>> +		error = xfs_attr3_leaf_to_node(args);
>>> +		if (error)
>>> +			return error;
>>> +		return -EAGAIN;
>>>    	}
>>> -	return 0;
>>> +	return xfs_attr_node_addname(args);
>>>    }
>>>    /*
>>>     * Set the attribute specified in @args.
>>>     */
>>>    int
>>> -xfs_attr_set_args(
>>> +__xfs_attr_set_args(
>>>    	struct xfs_da_args	*args)
>>>    {
>>>    	struct xfs_inode	*dp = args->dp;
>>> -	struct xfs_buf          *leaf_bp = NULL;
>>>    	int			error = 0;
>>> -	/*
>>> -	 * If the attribute list is already in leaf format, jump straight to
>>> -	 * leaf handling.  Otherwise, try to add the attribute to the shortform
>>> -	 * list; if there's no room then convert the list to leaf format and try
>>> -	 * again.
>>> -	 */
>>> -	if (xfs_attr_is_shortform(dp)) {
>>> -
>>> -		/*
>>> -		 * If the attr was successfully set in shortform, the
>>> -		 * transaction is committed and set to NULL.  Otherwise, is it
>>> -		 * converted from shortform to leaf, and the transaction is
>>> -		 * retained.
>>> -		 */
>>> -		error = xfs_attr_set_shortform(args, &leaf_bp);
>>> -		if (error || !args->trans)
>>> -			return error;
>>> -	}
>>> -
>>>    	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>>    		error = xfs_attr_leaf_addname(args);
>>> -		if (error != -ENOSPC)
>>> -			return error;
>>> -
>>> -		/*
>>> -		 * Promote the attribute list to the Btree format.
>>> -		 */
>>> -		error = xfs_attr3_leaf_to_node(args);
>>>    		if (error)
>>>    			return error;
>>> +	}
>>> +
>>> +	error = xfs_attr_node_addname_work(args);
>>> +	return error;
>>> +}
>>> +
>>> +int
>>> +xfs_attr_set_args(
>>> +	struct xfs_da_args	*args)
>>> +
>>> +{
>>> +	int			error;
>>> +	bool			done = false;
>>> +
>>> +	do {
>>> +		error = xfs_attr_set_fmt(args, &done);
>>> +		if (error != -EAGAIN)
>>> +			break;
>>> -		/*
>>> -		 * Finish any deferred work items and roll the transaction once
>>> -		 * more.  The goal here is to call node_addname with the inode
>>> -		 * and transaction in the same state (inode locked and joined,
>>> -		 * transaction clean) no matter how we got to this step.
>>> -		 */
>>>    		error = xfs_defer_finish(&args->trans);
>>>    		if (error)
>>> -			return error;
>>> +			break;
>>> +		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>> +	} while (!error);
>>> -		/*
>>> -		 * Commit the current trans (including the inode) and
>>> -		 * start a new one.
>>> -		 */
>>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>>> -		if (error)
>>> -			return error;
>>> -	}
>>> +	if (error || done)
>>> +		return error;
>>> -	error = xfs_attr_node_addname(args);
>>> -	return error;
>>> +	error = xfs_defer_finish(&args->trans);
>>> +	if (!error)
>>> +		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>> +	if (error)
>>> +		return error;
>>> +
>>> +	return __xfs_attr_set_args(args);
>>>    }
>>>    /*
>>> @@ -676,18 +655,6 @@ xfs_attr_leaf_addname(
>>>    	trace_xfs_attr_leaf_addname(args);
>>> -	error = xfs_attr_leaf_try_add(args, bp);
>>> -	if (error)
>>> -		return error;
>>> -
>>> -	/*
>>> -	 * Commit the transaction that added the attr name so that
>>> -	 * later routines can manage their own transactions.
>>> -	 */
>>> -	error = xfs_trans_roll_inode(&args->trans, dp);
>>> -	if (error)
>>> -		return error;
>>> -
>>>    	/*
>>>    	 * If there was an out-of-line value, allocate the blocks we
>>>    	 * identified for its storage and copy the value.  This is done
>>> @@ -923,7 +890,7 @@ xfs_attr_node_addname(
>>>    	 * Fill in bucket of arguments/results/context to carry around.
>>>    	 */
>>>    	dp = args->dp;
>>> -restart:
>>> +
>>>    	/*
>>>    	 * Search to see if name already exists, and get back a pointer
>>>    	 * to where it should go.
>>> @@ -967,21 +934,10 @@ xfs_attr_node_addname(
>>>    			xfs_da_state_free(state);
>>>    			state = NULL;
>>>    			error = xfs_attr3_leaf_to_node(args);
>>> -			if (error)
>>> -				goto out;
>>> -			error = xfs_defer_finish(&args->trans);
>>>    			if (error)
>>>    				goto out;
>>> -			/*
>>> -			 * Commit the node conversion and start the next
>>> -			 * trans in the chain.
>>> -			 */
>>> -			error = xfs_trans_roll_inode(&args->trans, dp);
>>> -			if (error)
>>> -				goto out;
>>> -
>>> -			goto restart;
>>> +			return -EAGAIN;
>>>    		}
>>>    		/*
>>> @@ -993,9 +949,6 @@ xfs_attr_node_addname(
>>>    		error = xfs_da3_split(state);
>>>    		if (error)
>>>    			goto out;
>>> -		error = xfs_defer_finish(&args->trans);
>>> -		if (error)
>>> -			goto out;
>>>    	} else {
>>>    		/*
>>>    		 * Addition succeeded, update Btree hashvals.
>>> @@ -1010,13 +963,23 @@ xfs_attr_node_addname(
>>>    	xfs_da_state_free(state);
>>>    	state = NULL;
>>> -	/*
>>> -	 * Commit the leaf addition or btree split and start the next
>>> -	 * trans in the chain.
>>> -	 */
>>> -	error = xfs_trans_roll_inode(&args->trans, dp);
>>> +	return 0;
>>> +
>>> +out:
>>> +	if (state)
>>> +		xfs_da_state_free(state);
>>>    	if (error)
>>> -		goto out;
>>> +		return error;
>>> +	return retval;
>>> +}
>>> +
>>> +STATIC int
>>> +xfs_attr_node_addname_work(
>>> +	struct xfs_da_args	*args)
>>> +{
>>> +	struct xfs_da_state	*state;
>>> +	struct xfs_da_state_blk	*blk;
>>> +	int			retval, error;
>>>    	/*
>>>    	 * If there was an out-of-line value, allocate the blocks we
>>>
>>
>
Brian Foster Jan. 6, 2021, 2:25 p.m. UTC | #12
On Tue, Jan 05, 2021 at 11:10:27AM -0700, Allison Henderson wrote:
> 
> 
> On 1/4/21 10:52 AM, Brian Foster wrote:
> > On Thu, Dec 24, 2020 at 01:23:24AM -0700, Allison Henderson wrote:
> > > 
> > > 
> > > On 12/23/20 7:16 AM, Brian Foster wrote:
> > > > On Tue, Dec 22, 2020 at 10:20:16PM -0700, Allison Henderson wrote:
> > > > > 
> > > > > 
> > > > > On 12/22/20 11:44 AM, Brian Foster wrote:
> > > > > > On Tue, Dec 22, 2020 at 12:20:20PM -0500, Brian Foster wrote:
> > > > > > > On Tue, Dec 22, 2020 at 12:11:48PM -0500, Brian Foster wrote:
> > > > > > > > On Fri, Dec 18, 2020 at 12:29:06AM -0700, Allison Henderson wrote:
> > > > > > > > > This patch modifies the attr remove routines to be delay ready. This
> > > > > > > > > means they no longer roll or commit transactions, but instead return
> > > > > > > > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > > > > > > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > > > > > > > uses a sort of state machine like switch to keep track of where it was
> > > > > > > > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > > > > > > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > > > > > > > consists of a simple loop to refresh the transaction until the operation
> > > > > > > > > is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > > > > > > > transaction where ever the existing code used to.
> > > > > > > > > 
> > > > > > > > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > > > > > > > version __xfs_attr_rmtval_remove. We will rename
> > > > > > > > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > > > > > > > done.
> > > > > > > > > 
> > > > > > > > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > > > > > > > during a rename).  For reasons of preserving existing function, we
> > > > > > > > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > > > > > > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > > > > > > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > > > > > > > used and will be removed.
> > > > > > > > > 
> > > > > > > > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > > > > > > > to keep track of the current state of an attribute operation. The new
> > > > > > > > > xfs_delattr_state enum is used to track various operations that are in
> > > > > > > > > progress so that we know not to repeat them, and resume where we left
> > > > > > > > > off before EAGAIN was returned to cycle out the transaction. Other
> > > > > > > > > members take the place of local variables that need to retain their
> > > > > > > > > values across multiple function recalls.  See xfs_attr.h for a more
> > > > > > > > > detailed diagram of the states.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > > > > > > > ---
> > > > > > > > 
> > > > > > > > I started with a couple small comments on this patch but inevitably
> > > > > > > > started thinking more about the factoring again and ended up with a
> > > > > > > > couple patches on top. The first is more of some small tweaks and
> > > > > > > > open-coding that IMO makes this patch a bit easier to follow. The
> > > > > > > > second is more of an RFC so I'll follow up with that in a second email.
> > > > > > > > I'm curious what folks' thoughts might be on either. Also note that I'm
> > > > > > > > primarily focusing on code structure and whatnot here, so these are fast
> > > > > > > > and loose, compile tested only and likely to be broken.
> > > > > > > > 
> > > > > > > 
> > > > > > > ... and here's the second diff (applies on top of the first).
> > > > > > > 
> > > > > > > This one popped up after staring at the previous changes for a bit and
> > > > > > > wondering whether using "done flags" might make the whole thing easier
> > > > > > > to follow than incremental state transitions. I think the attr remove
> > > > > > > path is easy enough to follow with either method, but the attr set path
> > > > > > > is a beast and so this is more with that in mind. Initial thoughts?
> > > > > > > 
> > > > > > 
> > > > > > Eh, the more I stare at the attr set code I'm not sure this by itself is
> > > > > > much of an improvement. It helps in some areas, but there are so many
> > > > > > transaction rolls embedded throughout at different levels that a larger
> > > > > > rework of the code is probably still necessary. Anyways, this was just a
> > > > > > random thought for now..
> > > > > > 
> > > > > > Brian
> > > > > 
> > > > > No worries, I know the feeling :-)  The set works and all, but I do think
> > > > > there is struggle around trying to find a particularly pleasent looking
> > > > > presentation of it.  Especially when we get into the set path, it's a bit
> > > > > more complex.  I may pick through the patches you habe here and pick up the
> > > > > whitespace cleanups and other style adjustments if people prefer it that
> > > > > way.  The good news is, a lot of the *_args routines are supposed to
> > > > > disappear at the end of the set, so there's not really a need to invest too
> > > > > much in them I suppose. It may help to jump to the "Set up infastructure"
> > > > > patch too.  I've expanded the diagram to try and help illustrait the code
> > > > > flow a bit, so that may help with following the code flow.
> > > > > 
> > > > 
> > > > I'm sure.. :P Note that the first patch was more smaller tweaks and
> > > > refactoring with the existing model in mind. For the set path, the
> > > > challenge IMO is to make the code generally more readable. I think the
> > > > remove path accomplishes this for the most part because the states and
> > > > whatnot are fairly low overhead on top of the existing complexity. This
> > > > changes considerably for the set path, not so much due to the mechanism
> > > > but because the baseline code is so fragmented and complex from the
> > > > start. I am slightly concerned that bolting state management onto the
> > > > current code as such might make it harder to grok and clean up after the
> > > > fact, but I could be wrong about that (my hope was certainly for the
> > > > opposite).
> > > tbh, everytime I do another spin of the set, I actually make all my
> > > modifications on top of the extended set, with parent pointers and all, and
> > > make sure all the test cases are still good.  I know pptrs are still pretty
> > > far out from here, but they're actually the best testcase for this, because
> > > it generates so much more activity.  If all thats still golden, then I'll
> > > pull them back down into the lower subsets and work out all the conflicts on
> > > the back way up.  If something went wrong, diffing the branch heads tracks
> > > it down pretty fast.
> > > 
> > 
> > Indeed, that's a good thing. My comment was more around the readability
> > of the code and subsequent ability to clean it up, reduce the number of
> > required states, etc...
> > 
> > > > 
> > > > Regardless, that had me shifting focus a bit and playing around with the
> > > > current upstream code as opposed to shifting around your code. ISTM that
> > > > there is some commonality across the various set codepaths and perhaps
> > > > there is potential to simplify things notably _before_ applying the
> > > > state management scheme. I've appended a new diff below (based on
> > > > for-next) that starts to demonstrate what I mean. Note again that this
> > > > is similarly fast and loose as I've knowingly threw away some quirks of
> > > > the code (i.e. leaf buffer bhold) for the purpose of quickly trying to
> > > > explore/POC whether the factoring might be sane and plausible.
> > > > 
> > > > In summary, this combines the "try addname" part of each xattr format to
> > > > fall under a single transaction rolling loop such that I think the
> > > > resulting function could become one high level state. I ran out of time
> > > > for working through the rest, but from a read through it seems there's
> > > > at least a chance we could continue with similar refactoring and
> > > > reduction to a fewer number of generic states (vs. more format-specific
> > > > states). For example, the remaining parts of the set operation all seem
> > > > to have something along the lines of the following high level
> > > > components:
> > > > 
> > > > - remote value block allocation (and value set)
> > > > - if rename == true, clear flag and done
> > > > - if rename == false, flip flags
> > > > 	- remove old xattr (i.e., similar to xattr remove)
> > > > 
> > > > ... where much of that code looks remarkably similar across the
> > > > different leaf/node code branches. So I'm curious what you and others
> > > > following along might think about something like this as an intermediate
> > > > step...
> > > 
> > > Yes, I had noticed similarities when we first started, though I got the
> > > impression that people mostly wanted to focus on just hoisting the
> > > transactions upwards.  I did look at them at one point, but seem to recall
> > > the similarities having just enough disimilarities such that trying to
> > > consolodate them tends to introduce about as much plumbing with if/else's.
> > > In any case, I do think the solution here with the format handling is
> > > creative, and may reduce a state or two, but I'd really need to see it
> > > through the test cases to know if it's going to work.  From what you've
> > > hashed out here, I think I get the idea. It's hard for me to comment on
> > > readability because I've been up and down the code so much.  I do think it's
> > > a little loopy looking, but so is the statemachine.  Maybe a good spot for
> > > others to chime in too.
> > > 
> > 
> > Can you elaborate on what you mean by loopy? :P I'm sure you noticed I
> > borrowed the transaction rolling mechanism from your infra patch..
> > 
> Well, that loop that is borrowed is meant to disappear at the end of the set
> though.  This part with *_set_fmt we would have to keep.  I guess that
> really means the *_set_fmt call would probably get consolodated into the
> *_iter routine though.  Let me see if I can get something like this to work
> on top of the set so it's a bit more clear what it would look like.  I think
> this modification would actually look simpler if it came in after the
> statemachine.  Otherwise you're trying to introduce the tranaction loop
> early.  Really it's purpose is just to get the state machine working, and
> then we get rid of it later.
> 

Sort of... the idea is more to reduce code duplication across the
currently separate codepaths to hopefully reduce the number of states
required. Note that the intent isn't to simplify away the state machine
approach entirely, but to simply reduce the number of states so the
resulting complexity of the set path is more in line with the remove
path. Given that, I'm not sure why this would imply we'd need to retain
the transaction loop, for example. I'd expect your subsequent
infrastructure changes and general state management approach to remain
fundamentally the same, only hopefully with fewer branches/states.

Indeed, it may be possible to do this kind of thing before or after the
infrastructure changes. I highly suspect the latter might seem more
simple to you being more familiar with the new code while the former
might seem more simple to somebody like me who is much less so. ;)

> > But yeah, I'm partly to blame for the hoisting approach as well. I was
> > thinking/hoping that seeing the various states would facilitate
> > simplification of the code, but my first reaction when looking at the
> > (much more complex) xattr set path is more confusion than clarity. I see
> > the code drop into state management, using that to call into
> > format-specific helpers, then fall into doing some other stuff that
> > might call into some of the same format-specific add helpers, then
> > realize I'll probably have to trace up and down through the whole path
> > to make some sense of the execution flow.
> 
> Yeah, I think this question is very prefrence oriented.  See, initially, I
> thought the pattern of pairing states to gotos sort of alleviated the
> anxiety of needing to trace up and down the code:
> 
> 
>    /*
>     * We're going away for a bit to cycle the tranaction,
>     * but we're gonna come back ....
>     */
>    dela_state = XFS_DAS_UNIQUE_STATE;
>    return -EAGAIN;
> 
> xfs_das_unique_state:
>    /* ...and resume execution here */
> 
> 
> Granted, sometimes we can use the state of the attr to get away from needing
> this, but now you have to re-read the code in the context of what ever form
> we're in to figure that we land back in the same place. I realize this is
> sort of a unique pattern, so I understand people wanting to explore the idea
> of simplifying it away.  At this point I feel like I can follow it either
> way, so it's really what folks are more comfortable with.
> 

As above, note that I'm definitely not attempting to simplify the
broader pattern away. Just exploring cleanups to the xattr set code to
reduce the complexity of the transition. The reason the patch I posted
doesn't have any state management is IIRC I only went as far as possible
before we'd probably need to define the first state. ;)

> That is what has me wondering
> > whether this would become more simple with fewer, generic and higher
> > level states like SET_FORMAT (i.e. what I hacked up), SET_NAME,
> > SET_VALUE (rmt block allocs), SET_FLAG (clear or flip), and then finally
> > fall into the remove path in the rename case.
> > 
> > We'd ultimately implement the same type of state machine approach, it
> > would just require more up front cleanup rework than the other way
> > around, and hopefully land fairly simplified from the onset. Of course
> > those states are just off the top of my head so might not be feasible,
> > but I'm also curious if any others following along might have thoughts
> > one way or the other. I'm sure we could implement things in either order
> > when it comes down to it...
> Yeah, let me see if it's feasable, and what it ends up looking like. I'm
> kindof of the opinion that if you to have have a certain degree of
> complexity (ie setting states, and resumeing with gotos), you may as well
> leverage it what it can do.  Once you abosorb that pattern, it's not so
> scary the next time you see it.  Simplfying is certainly a good thing, but
> if it breaks the pattern thats keeps a more complex concept organized, the
> simplification might not make as much sense to others.  I think it's likley
> a spot for others to chime in, I think after looking at the same code for a
> while, it's hard to put yourself in the POV of someone else still trying to
> work through it.  :-)
> 

The current organization of the code is what concerns me moreso than the
broader infrastructure or state patterns in general. IOW, I don't
actually see an obvious pattern emerge from reading through
xfs_attr_set_iter(), for example. I see some state code that jumps into
format helpers, followed by shortform code and then leaf/node addname
calls into similar or related calls seen at the top. This diverges from
the previously discussed goal of seeing all of the state management bits
at one level such that the execution flow of the operation is as obvious
as possible. Hence, I'm wondering if the reduced number of states
facilitates that goal, but perhaps I could dig further into it from that
angle as well...

Brian

> Allison
> 
> > 
> > Brian
> > 
> > > I actually find it easier to work on it from the top of the set rather than
> > > the bottom.  Just so that the end goal of what it will end up looking like
> > > is a little more clear.  Once the goal is clear, then I worry about layering
> > > it in what ever patch it goes in.  Otherwise it's harder to see exactly how
> > > the conflicts shake out.
> > > 
> > > Allison
> > > > 
> > > > Brian
> > > > 
> > > > --- 8< ---
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index fd8e6418a0d3..eff8833d5303 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > @@ -58,6 +58,8 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> > > >    				 struct xfs_da_state **state);
> > > >    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> > > >    STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> > > > +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *, struct xfs_buf *);
> > > > +STATIC int xfs_attr_node_addname_work(struct xfs_da_args *);
> > > >    int
> > > >    xfs_inode_hasattr(
> > > > @@ -216,116 +218,93 @@ xfs_attr_is_shortform(
> > > >    		ip->i_afp->if_nextents == 0);
> > > >    }
> > > > -/*
> > > > - * Attempts to set an attr in shortform, or converts short form to leaf form if
> > > > - * there is not enough room.  If the attr is set, the transaction is committed
> > > > - * and set to NULL.
> > > > - */
> > > > -STATIC int
> > > > -xfs_attr_set_shortform(
> > > > +int
> > > > +xfs_attr_set_fmt(
> > > >    	struct xfs_da_args	*args,
> > > > -	struct xfs_buf		**leaf_bp)
> > > > +	bool			*done)
> > > >    {
> > > >    	struct xfs_inode	*dp = args->dp;
> > > > -	int			error, error2 = 0;
> > > > +	struct xfs_buf		*leaf_bp = NULL;
> > > > +	int			error = 0;
> > > > -	/*
> > > > -	 * Try to add the attr to the attribute list in the inode.
> > > > -	 */
> > > > -	error = xfs_attr_try_sf_addname(dp, args);
> > > > -	if (error != -ENOSPC) {
> > > > -		error2 = xfs_trans_commit(args->trans);
> > > > -		args->trans = NULL;
> > > > -		return error ? error : error2;
> > > > +	if (xfs_attr_is_shortform(dp)) {
> > > > +		error = xfs_attr_try_sf_addname(dp, args);
> > > > +		if (!error)
> > > > +			*done = true;
> > > > +		if (error != -ENOSPC)
> > > > +			return error;
> > > > +
> > > > +		error = xfs_attr_shortform_to_leaf(args, &leaf_bp);
> > > > +		if (error)
> > > > +			return error;
> > > > +		return -EAGAIN;
> > > >    	}
> > > > -	/*
> > > > -	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> > > > -	 * another possible req'mt for a double-split btree op.
> > > > -	 */
> > > > -	error = xfs_attr_shortform_to_leaf(args, leaf_bp);
> > > > -	if (error)
> > > > -		return error;
> > > > -	/*
> > > > -	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
> > > > -	 * push cannot grab the half-baked leaf buffer and run into problems
> > > > -	 * with the write verifier. Once we're done rolling the transaction we
> > > > -	 * can release the hold and add the attr to the leaf.
> > > > -	 */
> > > > -	xfs_trans_bhold(args->trans, *leaf_bp);
> > > > -	error = xfs_defer_finish(&args->trans);
> > > > -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> > > > -	if (error) {
> > > > -		xfs_trans_brelse(args->trans, *leaf_bp);
> > > > -		return error;
> > > > +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > > > +		struct xfs_buf	*bp = NULL;
> > > > +
> > > > +		error = xfs_attr_leaf_try_add(args, bp);
> > > > +		if (error != -ENOSPC)
> > > > +			return error;
> > > > +
> > > > +		error = xfs_attr3_leaf_to_node(args);
> > > > +		if (error)
> > > > +			return error;
> > > > +		return -EAGAIN;
> > > >    	}
> > > > -	return 0;
> > > > +	return xfs_attr_node_addname(args);
> > > >    }
> > > >    /*
> > > >     * Set the attribute specified in @args.
> > > >     */
> > > >    int
> > > > -xfs_attr_set_args(
> > > > +__xfs_attr_set_args(
> > > >    	struct xfs_da_args	*args)
> > > >    {
> > > >    	struct xfs_inode	*dp = args->dp;
> > > > -	struct xfs_buf          *leaf_bp = NULL;
> > > >    	int			error = 0;
> > > > -	/*
> > > > -	 * If the attribute list is already in leaf format, jump straight to
> > > > -	 * leaf handling.  Otherwise, try to add the attribute to the shortform
> > > > -	 * list; if there's no room then convert the list to leaf format and try
> > > > -	 * again.
> > > > -	 */
> > > > -	if (xfs_attr_is_shortform(dp)) {
> > > > -
> > > > -		/*
> > > > -		 * If the attr was successfully set in shortform, the
> > > > -		 * transaction is committed and set to NULL.  Otherwise, is it
> > > > -		 * converted from shortform to leaf, and the transaction is
> > > > -		 * retained.
> > > > -		 */
> > > > -		error = xfs_attr_set_shortform(args, &leaf_bp);
> > > > -		if (error || !args->trans)
> > > > -			return error;
> > > > -	}
> > > > -
> > > >    	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > > >    		error = xfs_attr_leaf_addname(args);
> > > > -		if (error != -ENOSPC)
> > > > -			return error;
> > > > -
> > > > -		/*
> > > > -		 * Promote the attribute list to the Btree format.
> > > > -		 */
> > > > -		error = xfs_attr3_leaf_to_node(args);
> > > >    		if (error)
> > > >    			return error;
> > > > +	}
> > > > +
> > > > +	error = xfs_attr_node_addname_work(args);
> > > > +	return error;
> > > > +}
> > > > +
> > > > +int
> > > > +xfs_attr_set_args(
> > > > +	struct xfs_da_args	*args)
> > > > +
> > > > +{
> > > > +	int			error;
> > > > +	bool			done = false;
> > > > +
> > > > +	do {
> > > > +		error = xfs_attr_set_fmt(args, &done);
> > > > +		if (error != -EAGAIN)
> > > > +			break;
> > > > -		/*
> > > > -		 * Finish any deferred work items and roll the transaction once
> > > > -		 * more.  The goal here is to call node_addname with the inode
> > > > -		 * and transaction in the same state (inode locked and joined,
> > > > -		 * transaction clean) no matter how we got to this step.
> > > > -		 */
> > > >    		error = xfs_defer_finish(&args->trans);
> > > >    		if (error)
> > > > -			return error;
> > > > +			break;
> > > > +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > +	} while (!error);
> > > > -		/*
> > > > -		 * Commit the current trans (including the inode) and
> > > > -		 * start a new one.
> > > > -		 */
> > > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > > -		if (error)
> > > > -			return error;
> > > > -	}
> > > > +	if (error || done)
> > > > +		return error;
> > > > -	error = xfs_attr_node_addname(args);
> > > > -	return error;
> > > > +	error = xfs_defer_finish(&args->trans);
> > > > +	if (!error)
> > > > +		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	return __xfs_attr_set_args(args);
> > > >    }
> > > >    /*
> > > > @@ -676,18 +655,6 @@ xfs_attr_leaf_addname(
> > > >    	trace_xfs_attr_leaf_addname(args);
> > > > -	error = xfs_attr_leaf_try_add(args, bp);
> > > > -	if (error)
> > > > -		return error;
> > > > -
> > > > -	/*
> > > > -	 * Commit the transaction that added the attr name so that
> > > > -	 * later routines can manage their own transactions.
> > > > -	 */
> > > > -	error = xfs_trans_roll_inode(&args->trans, dp);
> > > > -	if (error)
> > > > -		return error;
> > > > -
> > > >    	/*
> > > >    	 * If there was an out-of-line value, allocate the blocks we
> > > >    	 * identified for its storage and copy the value.  This is done
> > > > @@ -923,7 +890,7 @@ xfs_attr_node_addname(
> > > >    	 * Fill in bucket of arguments/results/context to carry around.
> > > >    	 */
> > > >    	dp = args->dp;
> > > > -restart:
> > > > +
> > > >    	/*
> > > >    	 * Search to see if name already exists, and get back a pointer
> > > >    	 * to where it should go.
> > > > @@ -967,21 +934,10 @@ xfs_attr_node_addname(
> > > >    			xfs_da_state_free(state);
> > > >    			state = NULL;
> > > >    			error = xfs_attr3_leaf_to_node(args);
> > > > -			if (error)
> > > > -				goto out;
> > > > -			error = xfs_defer_finish(&args->trans);
> > > >    			if (error)
> > > >    				goto out;
> > > > -			/*
> > > > -			 * Commit the node conversion and start the next
> > > > -			 * trans in the chain.
> > > > -			 */
> > > > -			error = xfs_trans_roll_inode(&args->trans, dp);
> > > > -			if (error)
> > > > -				goto out;
> > > > -
> > > > -			goto restart;
> > > > +			return -EAGAIN;
> > > >    		}
> > > >    		/*
> > > > @@ -993,9 +949,6 @@ xfs_attr_node_addname(
> > > >    		error = xfs_da3_split(state);
> > > >    		if (error)
> > > >    			goto out;
> > > > -		error = xfs_defer_finish(&args->trans);
> > > > -		if (error)
> > > > -			goto out;
> > > >    	} else {
> > > >    		/*
> > > >    		 * Addition succeeded, update Btree hashvals.
> > > > @@ -1010,13 +963,23 @@ xfs_attr_node_addname(
> > > >    	xfs_da_state_free(state);
> > > >    	state = NULL;
> > > > -	/*
> > > > -	 * Commit the leaf addition or btree split and start the next
> > > > -	 * trans in the chain.
> > > > -	 */
> > > > -	error = xfs_trans_roll_inode(&args->trans, dp);
> > > > +	return 0;
> > > > +
> > > > +out:
> > > > +	if (state)
> > > > +		xfs_da_state_free(state);
> > > >    	if (error)
> > > > -		goto out;
> > > > +		return error;
> > > > +	return retval;
> > > > +}
> > > > +
> > > > +STATIC int
> > > > +xfs_attr_node_addname_work(
> > > > +	struct xfs_da_args	*args)
> > > > +{
> > > > +	struct xfs_da_state	*state;
> > > > +	struct xfs_da_state_blk	*blk;
> > > > +	int			retval, error;
> > > >    	/*
> > > >    	 * If there was an out-of-line value, allocate the blocks we
> > > > 
> > > 
> > 
>
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 1969b88..b6330f9 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -53,7 +53,7 @@  STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
  */
 STATIC int xfs_attr_node_get(xfs_da_args_t *args);
 STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
-STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
+STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
@@ -264,6 +264,34 @@  xfs_attr_set_shortform(
 }
 
 /*
+ * Checks to see if a delayed attribute transaction should be rolled.  If so,
+ * also checks for a defer finish.  Transaction is finished and rolled as
+ * needed, and returns true of false if the delayed operation should continue.
+ */
+int
+xfs_attr_trans_roll(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	int				error;
+
+	if (dac->flags & XFS_DAC_DEFER_FINISH) {
+		/*
+		 * The caller wants us to finish all the deferred ops so that we
+		 * avoid pinning the log tail with a large number of deferred
+		 * ops.
+		 */
+		dac->flags &= ~XFS_DAC_DEFER_FINISH;
+		error = xfs_defer_finish(&args->trans);
+		if (error)
+			return error;
+	} else
+		error = xfs_trans_roll_inode(&args->trans, args->dp);
+
+	return error;
+}
+
+/*
  * Set the attribute specified in @args.
  */
 int
@@ -364,23 +392,58 @@  xfs_has_attr(
  */
 int
 xfs_attr_remove_args(
-	struct xfs_da_args      *args)
+	struct xfs_da_args	*args)
 {
-	struct xfs_inode	*dp = args->dp;
-	int			error;
+	int				error;
+	struct xfs_delattr_context	dac = {
+		.da_args	= args,
+	};
+
+	do {
+		error = xfs_attr_remove_iter(&dac);
+		if (error != -EAGAIN)
+			break;
+
+		error = xfs_attr_trans_roll(&dac);
+		if (error)
+			return error;
+
+	} while (true);
+
+	return error;
+}
 
-	if (!xfs_inode_hasattr(dp)) {
-		error = -ENOATTR;
-	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
+/*
+ * Remove the attribute specified in @args.
+ *
+ * This function may return -EAGAIN to signal that the transaction needs to be
+ * rolled.  Callers should continue calling this function until they receive a
+ * return value other than -EAGAIN.
+ */
+int
+xfs_attr_remove_iter(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_inode		*dp = args->dp;
+
+	/* If we are shrinking a node, resume shrink */
+	if (dac->dela_state == XFS_DAS_RM_SHRINK)
+		goto node;
+
+	if (!xfs_inode_hasattr(dp))
+		return -ENOATTR;
+
+	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
 		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
-		error = xfs_attr_shortform_remove(args);
-	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
-		error = xfs_attr_leaf_removename(args);
-	} else {
-		error = xfs_attr_node_removename(args);
+		return xfs_attr_shortform_remove(args);
 	}
 
-	return error;
+	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
+		return xfs_attr_leaf_removename(args);
+node:
+	/* If we are not short form or leaf, then proceed to remove node */
+	return  xfs_attr_node_removename_iter(dac);
 }
 
 /*
@@ -1178,10 +1241,11 @@  xfs_attr_leaf_mark_incomplete(
  */
 STATIC
 int xfs_attr_node_removename_setup(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	**state)
+	struct xfs_delattr_context	*dac)
 {
-	int			error;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		**state = &dac->da_state;
+	int				error;
 
 	error = xfs_attr_node_hasname(args, state);
 	if (error != -EEXIST)
@@ -1203,13 +1267,16 @@  int xfs_attr_node_removename_setup(
 }
 
 STATIC int
-xfs_attr_node_remove_rmt(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
+xfs_attr_node_remove_rmt (
+	struct xfs_delattr_context	*dac,
+	struct xfs_da_state		*state)
 {
-	int			error = 0;
+	int				error = 0;
 
-	error = xfs_attr_rmtval_remove(args);
+	/*
+	 * May return -EAGAIN to request that the caller recall this function
+	 */
+	error = __xfs_attr_rmtval_remove(dac);
 	if (error)
 		return error;
 
@@ -1240,28 +1307,34 @@  xfs_attr_node_remove_cleanup(
 }
 
 /*
- * Remove a name from a B-tree attribute list.
+ * Step through removeing a name from a B-tree attribute list.
  *
  * This will involve walking down the Btree, and may involve joining
  * leaf nodes and even joining intermediate nodes up to and including
  * the root node (a special case of an intermediate node).
+ *
+ * This routine is meant to function as either an inline or delayed operation,
+ * and may return -EAGAIN when the transaction needs to be rolled.  Calling
+ * functions will need to handle this, and recall the function until a
+ * successful error code is returned.
  */
 STATIC int
 xfs_attr_node_remove_step(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
+	struct xfs_delattr_context	*dac)
 {
-	int			error;
-	struct xfs_inode	*dp = args->dp;
-
-
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state = dac->da_state;
+	int				error = 0;
 	/*
 	 * If there is an out-of-line value, de-allocate the blocks.
 	 * This is done before we remove the attribute so that we don't
 	 * overflow the maximum size of a transaction and/or hit a deadlock.
 	 */
 	if (args->rmtblkno > 0) {
-		error = xfs_attr_node_remove_rmt(args, state);
+		/*
+		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
+		 */
+		error = xfs_attr_node_remove_rmt(dac, state);
 		if (error)
 			return error;
 	}
@@ -1274,51 +1347,74 @@  xfs_attr_node_remove_step(
  *
  * This routine will find the blocks of the name to remove, remove them and
  * shrink the tree if needed.
+ *
+ * This routine is meant to function as either an inline or delayed operation,
+ * and may return -EAGAIN when the transaction needs to be rolled.  Calling
+ * functions will need to handle this, and recall the function until a
+ * successful error code is returned.
  */
 STATIC int
-xfs_attr_node_removename(
-	struct xfs_da_args	*args)
+xfs_attr_node_removename_iter(
+	struct xfs_delattr_context	*dac)
 {
-	struct xfs_da_state	*state = NULL;
-	int			retval, error;
-	struct xfs_inode	*dp = args->dp;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state = NULL;
+	int				retval, error;
+	struct xfs_inode		*dp = args->dp;
 
 	trace_xfs_attr_node_removename(args);
 
-	error = xfs_attr_node_removename_setup(args, &state);
-	if (error)
-		goto out;
+	if (!dac->da_state) {
+		error = xfs_attr_node_removename_setup(dac);
+		if (error)
+			goto out;
+	}
+	state = dac->da_state;
 
-	error = xfs_attr_node_remove_step(args, state);
-	if (error)
-		goto out;
+	switch (dac->dela_state) {
+	case XFS_DAS_UNINIT:
+		/*
+		 * repeatedly remove remote blocks, remove the entry and join.
+		 * returns -EAGAIN or 0 for completion of the step.
+		 */
+		error = xfs_attr_node_remove_step(dac);
+		if (error)
+			break;
 
-	retval = xfs_attr_node_remove_cleanup(args, state);
+		retval = xfs_attr_node_remove_cleanup(args, state);
 
-	/*
-	 * Check to see if the tree needs to be collapsed.
-	 */
-	if (retval && (state->path.active > 1)) {
-		error = xfs_da3_join(state);
-		if (error)
-			return error;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			return error;
 		/*
-		 * Commit the Btree join operation and start a new trans.
+		 * Check to see if the tree needs to be collapsed. Set the flag
+		 * to indicate that the calling function needs to move the
+		 * shrink operation
 		 */
-		error = xfs_trans_roll_inode(&args->trans, dp);
-		if (error)
-			return error;
-	}
+		if (retval && (state->path.active > 1)) {
+			error = xfs_da3_join(state);
+			if (error)
+				return error;
 
-	/*
-	 * If the result is small enough, push it all into the inode.
-	 */
-	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
-		error = xfs_attr_node_shrink(args, state);
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			dac->dela_state = XFS_DAS_RM_SHRINK;
+			return -EAGAIN;
+		}
+
+		/* fallthrough */
+	case XFS_DAS_RM_SHRINK:
+		/*
+		 * If the result is small enough, push it all into the inode.
+		 */
+		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
+			error = xfs_attr_node_shrink(args, state);
+
+		break;
+	default:
+		ASSERT(0);
+		error = -EINVAL;
+		goto out;
+	}
 
+	if (error == -EAGAIN)
+		return error;
 out:
 	if (state)
 		xfs_da_state_free(state);
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3e97a93..3154ef4 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -74,6 +74,102 @@  struct xfs_attr_list_context {
 };
 
 
+/*
+ * ========================================================================
+ * Structure used to pass context around among the delayed routines.
+ * ========================================================================
+ */
+
+/*
+ * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
+ * states indicate places where the function would return -EAGAIN, and then
+ * immediately resume from after being recalled by the calling function. States
+ * marked as a "subroutine state" indicate that they belong to a subroutine, and
+ * so the calling function needs to pass them back to that subroutine to allow
+ * it to finish where it left off. But they otherwise do not have a role in the
+ * calling function other than just passing through.
+ *
+ * xfs_attr_remove_iter()
+ *              │
+ *              v
+ *        found attr blks? ───n──┐
+ *              │                v
+ *              │         find and invalidate
+ *              y         the blocks. mark
+ *              │         attr incomplete
+ *              ├────────────────┘
+ *              │
+ *              v
+ *      remove a block with
+ *    xfs_attr_node_remove_step <────┐
+ *              │                    │
+ *              v                    │
+ *      still have blks ──y──> return -EAGAIN.
+ *        to remove?          re-enter with one
+ *              │            less blk to remove
+ *              n
+ *              │
+ *              v
+ *       remove leaf and
+ *       update hash with
+ *   xfs_attr_node_remove_cleanup
+ *              │
+ *              v
+ *           need to
+ *        shrink tree? ─n─┐
+ *              │         │
+ *              y         │
+ *              │         │
+ *              v         │
+ *          join leaf     │
+ *              │         │
+ *              v         │
+ *      XFS_DAS_RM_SHRINK │
+ *              │         │
+ *              v         │
+ *       do the shrink    │
+ *              │         │
+ *              v         │
+ *          free state <──┘
+ *              │
+ *              v
+ *            done
+ *
+ */
+
+/*
+ * Enum values for xfs_delattr_context.da_state
+ *
+ * These values are used by delayed attribute operations to keep track  of where
+ * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
+ * calling function to roll the transaction, and then recall the subroutine to
+ * finish the operation.  The enum is then used by the subroutine to jump back
+ * to where it was and resume executing where it left off.
+ */
+enum xfs_delattr_state {
+	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
+	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
+};
+
+/*
+ * Defines for xfs_delattr_context.flags
+ */
+#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
+
+/*
+ * Context used for keeping track of delayed attribute operations
+ */
+struct xfs_delattr_context {
+	struct xfs_da_args      *da_args;
+
+	/* Used in xfs_attr_node_removename to roll through removing blocks */
+	struct xfs_da_state     *da_state;
+
+	/* Used to keep track of current state of delayed operation */
+	unsigned int            flags;
+	enum xfs_delattr_state  dela_state;
+};
+
 /*========================================================================
  * Function prototypes for the kernel.
  *========================================================================*/
@@ -91,6 +187,10 @@  int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_args(struct xfs_da_args *args);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
+int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
+int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
+void xfs_delattr_context_init(struct xfs_delattr_context *dac,
+			      struct xfs_da_args *args);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index d6ef69a..3780141 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -19,8 +19,8 @@ 
 #include "xfs_bmap_btree.h"
 #include "xfs_bmap.h"
 #include "xfs_attr_sf.h"
-#include "xfs_attr_remote.h"
 #include "xfs_attr.h"
+#include "xfs_attr_remote.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_error.h"
 #include "xfs_trace.h"
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 48d8e9c..f09820c 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -674,10 +674,12 @@  xfs_attr_rmtval_invalidate(
  */
 int
 xfs_attr_rmtval_remove(
-	struct xfs_da_args      *args)
+	struct xfs_da_args		*args)
 {
-	int			error;
-	int			retval;
+	int				error;
+	struct xfs_delattr_context	dac  = {
+		.da_args	= args,
+	};
 
 	trace_xfs_attr_rmtval_remove(args);
 
@@ -685,31 +687,29 @@  xfs_attr_rmtval_remove(
 	 * Keep de-allocating extents until the remote-value region is gone.
 	 */
 	do {
-		retval = __xfs_attr_rmtval_remove(args);
-		if (retval && retval != -EAGAIN)
-			return retval;
+		error = __xfs_attr_rmtval_remove(&dac);
+		if (error != -EAGAIN)
+			break;
 
-		/*
-		 * Close out trans and start the next one in the chain.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, args->dp);
+		error = xfs_attr_trans_roll(&dac);
 		if (error)
 			return error;
-	} while (retval == -EAGAIN);
+	} while (true);
 
-	return 0;
+	return error;
 }
 
 /*
  * Remove the value associated with an attribute by deleting the out-of-line
- * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
+ * buffer that it is stored on. Returns -EAGAIN for the caller to refresh the
  * transaction and re-call the function
  */
 int
 __xfs_attr_rmtval_remove(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	int			error, done;
+	struct xfs_da_args		*args = dac->da_args;
+	int				error, done;
 
 	/*
 	 * Unmap value blocks for this attr.
@@ -719,12 +719,20 @@  __xfs_attr_rmtval_remove(
 	if (error)
 		return error;
 
-	error = xfs_defer_finish(&args->trans);
-	if (error)
-		return error;
-
-	if (!done)
+	/*
+	 * We dont need an explicit state here to pick up where we left off.  We
+	 * can figure it out using the !done return code.  Calling function only
+	 * needs to keep recalling this routine until we indicate to stop by
+	 * returning anything other than -EAGAIN. The actual value of
+	 * attr->xattri_dela_state may be some value reminicent of the calling
+	 * function, but it's value is irrelevant with in the context of this
+	 * function.  Once we are done here, the next state is set as needed
+	 * by the parent
+	 */
+	if (!done) {
+		dac->flags |= XFS_DAC_DEFER_FINISH;
 		return -EAGAIN;
+	}
 
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 9eee615..002fd30 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -14,5 +14,5 @@  int xfs_attr_rmtval_remove(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
-int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
+int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index bfad669..aaa7e66 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -15,10 +15,10 @@ 
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_inode.h"
+#include "xfs_attr.h"
 #include "xfs_attr_remote.h"
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
-#include "xfs_attr.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_quota.h"
 #include "xfs_dir2.h"