diff mbox series

[v7,13/19] xfs: Add delay ready attr remove routines

Message ID 20200223020611.1802-14-allison.henderson@oracle.com (mailing list archive)
State Superseded
Headers show
Series xfs: Delayed Ready Attrs | expand

Commit Message

Allison Henderson Feb. 23, 2020, 2:06 a.m. UTC
This patch modifies the attr remove routines to be delay ready. This means they no
longer roll or commit transactions, but instead return -EAGAIN to have the calling
routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
been modified to use the switch, and a  new version of xfs_attr_remove_args
consists of a simple loop to refresh the transaction until the operation is
completed.

This patch also adds a new struct xfs_delattr_context, which we will use to keep
track of the current state of an attribute operation. The new xfs_delattr_state
enum is used to track various operations that are in progress so that we know not
to repeat them, and resume where we left off before EAGAIN was returned to cycle
out the transaction. Other members take the place of local variables that need
to retain their values across multiple function recalls.

Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
indicate places where the function would return -EAGAIN, and then immediately
resume from after being recalled by the calling function.  States marked as a
"subroutine state" indicate that they belong to a subroutine, and so the calling
function needs to pass them back to that subroutine to allow it to finish where
it left off. But they otherwise do not have a role in the calling function other
than just passing through.

 xfs_attr_remove_iter()
         XFS_DAS_RM_SHRINK     ─┐
         (subroutine state)     │
                                │
         XFS_DAS_RMTVAL_REMOVE ─┤
         (subroutine state)     │
                                └─>xfs_attr_node_removename()
                                                 │
                                                 v
                                         need to remove
                                   ┌─n──  rmt blocks?
                                   │             │
                                   │             y
                                   │             │
                                   │             v
                                   │  ┌─>XFS_DAS_RMTVAL_REMOVE
                                   │  │          │
                                   │  │          v
                                   │  └──y── more blks
                                   │         to remove?
                                   │             │
                                   │             n
                                   │             │
                                   │             v
                                   │         need to
                                   └─────> shrink tree? ─n─┐
                                                 │         │
                                                 y         │
                                                 │         │
                                                 v         │
                                         XFS_DAS_RM_SHRINK │
                                                 │         │
                                                 v         │
                                                done <─────┘

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_attr.h     |   1 +
 fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
 fs/xfs/scrub/common.c        |   2 +
 fs/xfs/xfs_acl.c             |   2 +
 fs/xfs/xfs_attr_list.c       |   1 +
 fs/xfs/xfs_ioctl.c           |   2 +
 fs/xfs/xfs_ioctl32.c         |   2 +
 fs/xfs/xfs_iops.c            |   2 +
 fs/xfs/xfs_xattr.c           |   1 +
 10 files changed, 141 insertions(+), 16 deletions(-)

Comments

Brian Foster Feb. 24, 2020, 3:25 p.m. UTC | #1
On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
> This patch modifies the attr remove routines to be delay ready. This means they no
> longer roll or commit transactions, but instead return -EAGAIN to have the calling
> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
> been modified to use the switch, and a  new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation is
> completed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use to keep
> track of the current state of an attribute operation. The new xfs_delattr_state
> enum is used to track various operations that are in progress so that we know not
> to repeat them, and resume where we left off before EAGAIN was returned to cycle
> out the transaction. Other members take the place of local variables that need
> to retain their values across multiple function recalls.
> 
> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
> indicate places where the function would return -EAGAIN, and then immediately
> resume from after being recalled by the calling function.  States marked as a
> "subroutine state" indicate that they belong to a subroutine, and so the calling
> function needs to pass them back to that subroutine to allow it to finish where
> it left off. But they otherwise do not have a role in the calling function other
> than just passing through.
> 
>  xfs_attr_remove_iter()
>          XFS_DAS_RM_SHRINK     ─┐
>          (subroutine state)     │
>                                 │
>          XFS_DAS_RMTVAL_REMOVE ─┤
>          (subroutine state)     │
>                                 └─>xfs_attr_node_removename()
>                                                  │
>                                                  v
>                                          need to remove
>                                    ┌─n──  rmt blocks?
>                                    │             │
>                                    │             y
>                                    │             │
>                                    │             v
>                                    │  ┌─>XFS_DAS_RMTVAL_REMOVE
>                                    │  │          │
>                                    │  │          v
>                                    │  └──y── more blks
>                                    │         to remove?
>                                    │             │
>                                    │             n
>                                    │             │
>                                    │             v
>                                    │         need to
>                                    └─────> shrink tree? ─n─┐
>                                                  │         │
>                                                  y         │
>                                                  │         │
>                                                  v         │
>                                          XFS_DAS_RM_SHRINK │
>                                                  │         │
>                                                  v         │
>                                                 done <─────┘
> 

Wow. :P I guess I have nothing against verbose commit logs, but I wonder
how useful this level of documentation is for a patch that shouldn't
really change the existing flow of the operation.

> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>  fs/xfs/libxfs/xfs_attr.h     |   1 +
>  fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>  fs/xfs/scrub/common.c        |   2 +
>  fs/xfs/xfs_acl.c             |   2 +
>  fs/xfs/xfs_attr_list.c       |   1 +
>  fs/xfs/xfs_ioctl.c           |   2 +
>  fs/xfs/xfs_ioctl32.c         |   2 +
>  fs/xfs/xfs_iops.c            |   2 +
>  fs/xfs/xfs_xattr.c           |   1 +
>  10 files changed, 141 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 5d73bdf..cd3a3f7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -368,11 +368,60 @@ xfs_has_attr(
>   */
>  int
>  xfs_attr_remove_args(
> +	struct xfs_da_args	*args)
> +{
> +	int			error = 0;
> +	int			err2 = 0;
> +
> +	do {
> +		error = xfs_attr_remove_iter(args);
> +		if (error && error != -EAGAIN)
> +			goto out;
> +

I'm a little confused on the logic of this loop given that the only
caller commits the transaction (which also finishes dfops). IOW, it
seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
that is the case, this can be simplified to something like:

int
xfs_attr_remove_args(
        struct xfs_da_args      *args)
{
        int                     error;

        do {
                error = xfs_attr_remove_iter(args);
                if (error != -EAGAIN)
                        break;

                if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
                        args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
                        error = xfs_defer_finish(&args->trans);
                        if (error)
                                break;
                }

                error = xfs_trans_roll_inode(&args->trans, args->dp);
                if (error)
                        break;
        } while (true);

        return error;
}

That has the added benefit of eliminating the whole err2 pattern, which
always strikes me as a landmine.

> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {

BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?

> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> +
> +			err2 = xfs_defer_finish(&args->trans);
> +			if (err2) {
> +				error = err2;
> +				goto out;
> +			}
> +		}
> +
> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
> +		if (err2) {
> +			error = err2;
> +			goto out;
> +		}
> +
> +	} while (error == -EAGAIN);
> +out:
> +	return error;
> +}
> +
> +/*
> + * Remove the attribute specified in @args.
> + *
> + * This function may return -EAGAIN to signal that the transaction needs to be
> + * rolled.  Callers should continue calling this function until they receive a
> + * return value other than -EAGAIN.
> + */
> +int
> +xfs_attr_remove_iter(
>  	struct xfs_da_args      *args)
>  {
>  	struct xfs_inode	*dp = args->dp;
>  	int			error;
>  
> +	/* State machine switch */
> +	switch (args->dac.dela_state) {
> +	case XFS_DAS_RM_SHRINK:
> +	case XFS_DAS_RMTVAL_REMOVE:
> +		goto node;
> +	default:
> +		break;
> +	}
> +
>  	if (!xfs_inode_hasattr(dp)) {
>  		error = -ENOATTR;
>  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> @@ -381,6 +430,7 @@ xfs_attr_remove_args(
>  	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>  		error = xfs_attr_leaf_removename(args);
>  	} else {
> +node:
>  		error = xfs_attr_node_removename(args);
>  	}
>  
> @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
>  		/* bp is gone due to xfs_da_shrink_inode */
>  		if (error)
>  			return error;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			return error;
> +
> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>  	}
>  	return 0;
>  }
> @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
>   * This will involve walking down the Btree, and may involve joining
>   * leaf nodes and even joining intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
>  xfs_attr_node_removename(
> @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
>  	struct xfs_inode	*dp = args->dp;
>  
>  	trace_xfs_attr_node_removename(args);
> +	state = args->dac.da_state;
> +	blk = args->dac.blk;
> +
> +	/* State machine switch */
> +	switch (args->dac.dela_state) {
> +	case XFS_DAS_RMTVAL_REMOVE:
> +		goto rm_node_blks;
> +	case XFS_DAS_RM_SHRINK:
> +		goto rm_shrink;
> +	default:
> +		break;
> +	}
>  
>  	error = xfs_attr_node_hasname(args, &state);
>  	if (error != -EEXIST)
>  		goto out;
> +	else
> +		error = 0;

This doesn't look necessary.

>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
> @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
>  	blk = &state->path.blk[ state->path.active-1 ];
>  	ASSERT(blk->bp != NULL);
>  	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> +
> +	/*
> +	 * Store blk and state in the context incase we need to cycle out the
> +	 * transaction
> +	 */
> +	args->dac.blk = blk;
> +	args->dac.da_state = state;
> +
>  	if (args->rmtblkno > 0) {
>  		/*
>  		 * Fill in disk block numbers in the state structure
> @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
>  		if (error)
>  			goto out;
>  
> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +		error = xfs_attr_rmtval_invalidate(args);

Remind me why we lose the above trans roll? I vaguely recall that this
was intentional, but I could be mistaken...

>  		if (error)
>  			goto out;
> +	}
>  
> -		error = xfs_attr_rmtval_remove(args);
> -		if (error)
> -			goto out;
> +rm_node_blks:
> +
> +	if (args->rmtblkno > 0) {
> +		error = xfs_attr_rmtval_unmap(args);
> +
> +		if (error) {
> +			if (error == -EAGAIN)
> +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;

Might be helpful for the code labels to match the state names. I.e., use
das_rmtval_remove: for the label above.

> +			return error;
> +		}
>  
>  		/*
>  		 * Refill the state structure with buffers, the prior calls
> @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
>  		error = xfs_da3_join(state);
>  		if (error)
>  			goto out;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			goto out;
> -		/*
> -		 * Commit the Btree join operation and start a new trans.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			goto out;
> +
> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
> +		return -EAGAIN;
>  	}
>  
> +rm_shrink:
> +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
> +

There's an xfs_defer_finish() call further down this function. Should
that be replaced with the flag?

Finally, I mentioned in a previous review that this function should
probably be further broken down before fitting in the state management
stuff. It doesn't look like that happened so I've attached a diff that
is just intended to give an idea of what I mean by sectioning off the
hunks that might be able to break down into helpers. The helpers
wouldn't contain any state management, so we create a clear separation
between the state code and functional components. I think this initial
refactoring would make the introduction of state much more simple (and
perhaps alleviate the need for the huge diagram). It might also be
interesting to see how much of the result could be folded up further
into _removename_iter()...

Brian

>  	/*
>  	 * If the result is small enough, push it all into the inode.
>  	 */
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index ce7b039..ea873a5 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
> +int xfs_attr_remove_iter(struct xfs_da_args *args);
>  int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
>  		  int flags, struct attrlist_cursor_kern *cursor);
>  bool xfs_attr_namecheck(const void *name, size_t length);
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> index 14f1be3..3c78498 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -50,9 +50,39 @@ enum xfs_dacmp {
>  };
>  
>  /*
> + * Enum values for xfs_delattr_context.da_state
> + *
> + * These values are used by delayed attribute operations to keep track  of where
> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> + * calling function to roll the transaction, and then recall the subroutine to
> + * finish the operation.  The enum is then used by the subroutine to jump back
> + * to where it was and resume executing where it left off.
> + */
> +enum xfs_delattr_state {
> +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
> +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
> +};
> +
> +/*
> + * Defines for xfs_delattr_context.flags
> + */
> +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
> +
> +/*
> + * Context used for keeping track of delayed attribute operations
> + */
> +struct xfs_delattr_context {
> +	struct xfs_da_state	*da_state;
> +	struct xfs_da_state_blk *blk;
> +	unsigned int		flags;
> +	enum xfs_delattr_state	dela_state;
> +};
> +
> +/*
>   * Structure to ease passing around component names.
>   */
>  typedef struct xfs_da_args {
> +	struct xfs_delattr_context dac; /* context used for delay attr ops */
>  	struct xfs_da_geometry *geo;	/* da block geometry */
>  	struct xfs_name	name;		/* name, length and argument  flags*/
>  	uint8_t		filetype;	/* filetype of inode for directories */
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 1887605..9a649d1 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -24,6 +24,8 @@
>  #include "xfs_rmap_btree.h"
>  #include "xfs_log.h"
>  #include "xfs_trans_priv.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_reflink.h"
>  #include "scrub/scrub.h"
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index 42ac847..d65e6d8 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -10,6 +10,8 @@
>  #include "xfs_trans_resv.h"
>  #include "xfs_mount.h"
>  #include "xfs_inode.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_trace.h"
>  #include "xfs_error.h"
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index d37743b..881b9a4 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -12,6 +12,7 @@
>  #include "xfs_trans_resv.h"
>  #include "xfs_mount.h"
>  #include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_inode.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 28c07c9..7c1d9da 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -15,6 +15,8 @@
>  #include "xfs_iwalk.h"
>  #include "xfs_itable.h"
>  #include "xfs_error.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_bmap.h"
>  #include "xfs_bmap_util.h"
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index 769581a..d504f8f 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -17,6 +17,8 @@
>  #include "xfs_itable.h"
>  #include "xfs_fsops.h"
>  #include "xfs_rtalloc.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_ioctl.h"
>  #include "xfs_ioctl32.h"
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index e85bbf5..a2d299f 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -13,6 +13,8 @@
>  #include "xfs_inode.h"
>  #include "xfs_acl.h"
>  #include "xfs_quota.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_trans.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index 74133a5..d8dc72d 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -10,6 +10,7 @@
>  #include "xfs_log_format.h"
>  #include "xfs_da_format.h"
>  #include "xfs_inode.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_acl.h"
>  
> -- 
> 2.7.4
>
Brian Foster Feb. 24, 2020, 5:03 p.m. UTC | #2
On Mon, Feb 24, 2020 at 10:25:55AM -0500, Brian Foster wrote:
> On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
> > This patch modifies the attr remove routines to be delay ready. This means they no
> > longer roll or commit transactions, but instead return -EAGAIN to have the calling
> > routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
> > become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
> > track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
> > been modified to use the switch, and a  new version of xfs_attr_remove_args
> > consists of a simple loop to refresh the transaction until the operation is
> > completed.
> > 
> > This patch also adds a new struct xfs_delattr_context, which we will use to keep
> > track of the current state of an attribute operation. The new xfs_delattr_state
> > enum is used to track various operations that are in progress so that we know not
> > to repeat them, and resume where we left off before EAGAIN was returned to cycle
> > out the transaction. Other members take the place of local variables that need
> > to retain their values across multiple function recalls.
> > 
> > Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
> > indicate places where the function would return -EAGAIN, and then immediately
> > resume from after being recalled by the calling function.  States marked as a
> > "subroutine state" indicate that they belong to a subroutine, and so the calling
> > function needs to pass them back to that subroutine to allow it to finish where
> > it left off. But they otherwise do not have a role in the calling function other
> > than just passing through.
> > 
> >  xfs_attr_remove_iter()
> >          XFS_DAS_RM_SHRINK     ─┐
> >          (subroutine state)     │
> >                                 │
> >          XFS_DAS_RMTVAL_REMOVE ─┤
> >          (subroutine state)     │
> >                                 └─>xfs_attr_node_removename()
> >                                                  │
> >                                                  v
> >                                          need to remove
> >                                    ┌─n──  rmt blocks?
> >                                    │             │
> >                                    │             y
> >                                    │             │
> >                                    │             v
> >                                    │  ┌─>XFS_DAS_RMTVAL_REMOVE
> >                                    │  │          │
> >                                    │  │          v
> >                                    │  └──y── more blks
> >                                    │         to remove?
> >                                    │             │
> >                                    │             n
> >                                    │             │
> >                                    │             v
> >                                    │         need to
> >                                    └─────> shrink tree? ─n─┐
> >                                                  │         │
> >                                                  y         │
> >                                                  │         │
> >                                                  v         │
> >                                          XFS_DAS_RM_SHRINK │
> >                                                  │         │
> >                                                  v         │
> >                                                 done <─────┘
> > 
> 
> Wow. :P I guess I have nothing against verbose commit logs, but I wonder
> how useful this level of documentation is for a patch that shouldn't
> really change the existing flow of the operation.
> 
> > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
> >  fs/xfs/libxfs/xfs_attr.h     |   1 +
> >  fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
> >  fs/xfs/scrub/common.c        |   2 +
> >  fs/xfs/xfs_acl.c             |   2 +
> >  fs/xfs/xfs_attr_list.c       |   1 +
> >  fs/xfs/xfs_ioctl.c           |   2 +
> >  fs/xfs/xfs_ioctl32.c         |   2 +
> >  fs/xfs/xfs_iops.c            |   2 +
> >  fs/xfs/xfs_xattr.c           |   1 +
> >  10 files changed, 141 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 5d73bdf..cd3a3f7 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -368,11 +368,60 @@ xfs_has_attr(
> >   */
> >  int
> >  xfs_attr_remove_args(
> > +	struct xfs_da_args	*args)
> > +{
> > +	int			error = 0;
> > +	int			err2 = 0;
> > +
> > +	do {
> > +		error = xfs_attr_remove_iter(args);
> > +		if (error && error != -EAGAIN)
> > +			goto out;
> > +
> 
> I'm a little confused on the logic of this loop given that the only
> caller commits the transaction (which also finishes dfops). IOW, it
> seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
> that is the case, this can be simplified to something like:
> 
> int
> xfs_attr_remove_args(
>         struct xfs_da_args      *args)
> {
>         int                     error;
> 
>         do {
>                 error = xfs_attr_remove_iter(args);
>                 if (error != -EAGAIN)
>                         break;
> 
>                 if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>                         args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>                         error = xfs_defer_finish(&args->trans);
>                         if (error)
>                                 break;
>                 }
> 
>                 error = xfs_trans_roll_inode(&args->trans, args->dp);
>                 if (error)
>                         break;
>         } while (true);
> 
>         return error;
> }
> 
> That has the added benefit of eliminating the whole err2 pattern, which
> always strikes me as a landmine.
> 
> > +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> 
> BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
> operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?
> 
> > +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> > +
> > +			err2 = xfs_defer_finish(&args->trans);
> > +			if (err2) {
> > +				error = err2;
> > +				goto out;
> > +			}
> > +		}
> > +
> > +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
> > +		if (err2) {
> > +			error = err2;
> > +			goto out;
> > +		}
> > +
> > +	} while (error == -EAGAIN);
> > +out:
> > +	return error;
> > +}
> > +
> > +/*
> > + * Remove the attribute specified in @args.
> > + *
> > + * This function may return -EAGAIN to signal that the transaction needs to be
> > + * rolled.  Callers should continue calling this function until they receive a
> > + * return value other than -EAGAIN.
> > + */
> > +int
> > +xfs_attr_remove_iter(
> >  	struct xfs_da_args      *args)
> >  {
> >  	struct xfs_inode	*dp = args->dp;
> >  	int			error;
> >  
> > +	/* State machine switch */
> > +	switch (args->dac.dela_state) {
> > +	case XFS_DAS_RM_SHRINK:
> > +	case XFS_DAS_RMTVAL_REMOVE:
> > +		goto node;
> > +	default:
> > +		break;
> > +	}
> > +
> >  	if (!xfs_inode_hasattr(dp)) {
> >  		error = -ENOATTR;
> >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > @@ -381,6 +430,7 @@ xfs_attr_remove_args(
> >  	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> >  		error = xfs_attr_leaf_removename(args);
> >  	} else {
> > +node:
> >  		error = xfs_attr_node_removename(args);
> >  	}
> >  
> > @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
> >  		/* bp is gone due to xfs_da_shrink_inode */
> >  		if (error)
> >  			return error;
> > -		error = xfs_defer_finish(&args->trans);
> > -		if (error)
> > -			return error;
> > +
> > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> >  	}
> >  	return 0;
> >  }
> > @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
> >   * This will involve walking down the Btree, and may involve joining
> >   * leaf nodes and even joining intermediate nodes up to and including
> >   * the root node (a special case of an intermediate node).
> > + *
> > + * This routine is meant to function as either an inline or delayed operation,
> > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > + * functions will need to handle this, and recall the function until a
> > + * successful error code is returned.
> >   */
> >  STATIC int
> >  xfs_attr_node_removename(
> > @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
> >  	struct xfs_inode	*dp = args->dp;
> >  
> >  	trace_xfs_attr_node_removename(args);
> > +	state = args->dac.da_state;
> > +	blk = args->dac.blk;
> > +
> > +	/* State machine switch */
> > +	switch (args->dac.dela_state) {
> > +	case XFS_DAS_RMTVAL_REMOVE:
> > +		goto rm_node_blks;
> > +	case XFS_DAS_RM_SHRINK:
> > +		goto rm_shrink;
> > +	default:
> > +		break;
> > +	}
> >  
> >  	error = xfs_attr_node_hasname(args, &state);
> >  	if (error != -EEXIST)
> >  		goto out;
> > +	else
> > +		error = 0;
> 
> This doesn't look necessary.
> 
> >  
> >  	/*
> >  	 * If there is an out-of-line value, de-allocate the blocks.
> > @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
> >  	blk = &state->path.blk[ state->path.active-1 ];
> >  	ASSERT(blk->bp != NULL);
> >  	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> > +
> > +	/*
> > +	 * Store blk and state in the context incase we need to cycle out the
> > +	 * transaction
> > +	 */
> > +	args->dac.blk = blk;
> > +	args->dac.da_state = state;
> > +
> >  	if (args->rmtblkno > 0) {
> >  		/*
> >  		 * Fill in disk block numbers in the state structure
> > @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
> >  		if (error)
> >  			goto out;
> >  
> > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > +		error = xfs_attr_rmtval_invalidate(args);
> 
> Remind me why we lose the above trans roll? I vaguely recall that this
> was intentional, but I could be mistaken...
> 
> >  		if (error)
> >  			goto out;
> > +	}
> >  
> > -		error = xfs_attr_rmtval_remove(args);
> > -		if (error)
> > -			goto out;
> > +rm_node_blks:
> > +
> > +	if (args->rmtblkno > 0) {
> > +		error = xfs_attr_rmtval_unmap(args);
> > +
> > +		if (error) {
> > +			if (error == -EAGAIN)
> > +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
> 
> Might be helpful for the code labels to match the state names. I.e., use
> das_rmtval_remove: for the label above.
> 
> > +			return error;
> > +		}
> >  
> >  		/*
> >  		 * Refill the state structure with buffers, the prior calls
> > @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
> >  		error = xfs_da3_join(state);
> >  		if (error)
> >  			goto out;
> > -		error = xfs_defer_finish(&args->trans);
> > -		if (error)
> > -			goto out;
> > -		/*
> > -		 * Commit the Btree join operation and start a new trans.
> > -		 */
> > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > -		if (error)
> > -			goto out;
> > +
> > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> > +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > +		return -EAGAIN;
> >  	}
> >  
> > +rm_shrink:
> > +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > +
> 
> There's an xfs_defer_finish() call further down this function. Should
> that be replaced with the flag?
> 
> Finally, I mentioned in a previous review that this function should
> probably be further broken down before fitting in the state management
> stuff. It doesn't look like that happened so I've attached a diff that
> is just intended to give an idea of what I mean by sectioning off the
> hunks that might be able to break down into helpers. The helpers
> wouldn't contain any state management, so we create a clear separation
> between the state code and functional components. I think this initial
> refactoring would make the introduction of state much more simple (and
> perhaps alleviate the need for the huge diagram). It might also be
> interesting to see how much of the result could be folded up further
> into _removename_iter()...
> 

Gah.. attached for real this time.

Brian

> Brian
> 
> >  	/*
> >  	 * If the result is small enough, push it all into the inode.
> >  	 */
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index ce7b039..ea873a5 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
> >  int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
> >  int xfs_has_attr(struct xfs_da_args *args);
> >  int xfs_attr_remove_args(struct xfs_da_args *args);
> > +int xfs_attr_remove_iter(struct xfs_da_args *args);
> >  int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
> >  		  int flags, struct attrlist_cursor_kern *cursor);
> >  bool xfs_attr_namecheck(const void *name, size_t length);
> > diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> > index 14f1be3..3c78498 100644
> > --- a/fs/xfs/libxfs/xfs_da_btree.h
> > +++ b/fs/xfs/libxfs/xfs_da_btree.h
> > @@ -50,9 +50,39 @@ enum xfs_dacmp {
> >  };
> >  
> >  /*
> > + * Enum values for xfs_delattr_context.da_state
> > + *
> > + * These values are used by delayed attribute operations to keep track  of where
> > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > + * calling function to roll the transaction, and then recall the subroutine to
> > + * finish the operation.  The enum is then used by the subroutine to jump back
> > + * to where it was and resume executing where it left off.
> > + */
> > +enum xfs_delattr_state {
> > +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
> > +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
> > +};
> > +
> > +/*
> > + * Defines for xfs_delattr_context.flags
> > + */
> > +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
> > +
> > +/*
> > + * Context used for keeping track of delayed attribute operations
> > + */
> > +struct xfs_delattr_context {
> > +	struct xfs_da_state	*da_state;
> > +	struct xfs_da_state_blk *blk;
> > +	unsigned int		flags;
> > +	enum xfs_delattr_state	dela_state;
> > +};
> > +
> > +/*
> >   * Structure to ease passing around component names.
> >   */
> >  typedef struct xfs_da_args {
> > +	struct xfs_delattr_context dac; /* context used for delay attr ops */
> >  	struct xfs_da_geometry *geo;	/* da block geometry */
> >  	struct xfs_name	name;		/* name, length and argument  flags*/
> >  	uint8_t		filetype;	/* filetype of inode for directories */
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 1887605..9a649d1 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -24,6 +24,8 @@
> >  #include "xfs_rmap_btree.h"
> >  #include "xfs_log.h"
> >  #include "xfs_trans_priv.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_reflink.h"
> >  #include "scrub/scrub.h"
> > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > index 42ac847..d65e6d8 100644
> > --- a/fs/xfs/xfs_acl.c
> > +++ b/fs/xfs/xfs_acl.c
> > @@ -10,6 +10,8 @@
> >  #include "xfs_trans_resv.h"
> >  #include "xfs_mount.h"
> >  #include "xfs_inode.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_trace.h"
> >  #include "xfs_error.h"
> > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > index d37743b..881b9a4 100644
> > --- a/fs/xfs/xfs_attr_list.c
> > +++ b/fs/xfs/xfs_attr_list.c
> > @@ -12,6 +12,7 @@
> >  #include "xfs_trans_resv.h"
> >  #include "xfs_mount.h"
> >  #include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_inode.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_bmap.h"
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 28c07c9..7c1d9da 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -15,6 +15,8 @@
> >  #include "xfs_iwalk.h"
> >  #include "xfs_itable.h"
> >  #include "xfs_error.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_bmap.h"
> >  #include "xfs_bmap_util.h"
> > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > index 769581a..d504f8f 100644
> > --- a/fs/xfs/xfs_ioctl32.c
> > +++ b/fs/xfs/xfs_ioctl32.c
> > @@ -17,6 +17,8 @@
> >  #include "xfs_itable.h"
> >  #include "xfs_fsops.h"
> >  #include "xfs_rtalloc.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_ioctl.h"
> >  #include "xfs_ioctl32.h"
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index e85bbf5..a2d299f 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -13,6 +13,8 @@
> >  #include "xfs_inode.h"
> >  #include "xfs_acl.h"
> >  #include "xfs_quota.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_trace.h"
> > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > index 74133a5..d8dc72d 100644
> > --- a/fs/xfs/xfs_xattr.c
> > +++ b/fs/xfs/xfs_xattr.c
> > @@ -10,6 +10,7 @@
> >  #include "xfs_log_format.h"
> >  #include "xfs_da_format.h"
> >  #include "xfs_inode.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_acl.h"
> >  
> > -- 
> > 2.7.4
> > 
>
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index cd3a3f75c429..e0eaa274b70b 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1297,6 +1297,7 @@ xfs_attr_node_removename(
 		break;
 	}
 
+#if 0
 	error = xfs_attr_node_hasname(args, &state);
 	if (error != -EEXIST)
 		goto out;
@@ -1341,9 +1342,13 @@ xfs_attr_node_removename(
 		if (error)
 			goto out;
 	}
+#else
+	error = xfs_attr_node_removename_setup();
+#endif
 
 rm_node_blks:
 
+#if 0
 	if (args->rmtblkno > 0) {
 		error = xfs_attr_rmtval_unmap(args);
 
@@ -1361,6 +1366,11 @@ xfs_attr_node_removename(
 		if (error)
 			goto out;
 	}
+#else
+	args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
+	error = xfs_attr_node_removename_rmt();
+	/* -EAGAIN */
+#endif
 
 	/*
 	 * Remove the name and update the hashvals in the tree.
@@ -1370,6 +1380,7 @@ xfs_attr_node_removename(
 	retval = xfs_attr3_leaf_remove(blk->bp, args);
 	xfs_da3_fixhashpath(state, &state->path);
 
+#if 0
 	/*
 	 * Check to see if the tree needs to be collapsed.
 	 */
@@ -1413,6 +1424,12 @@ xfs_attr_node_removename(
 			xfs_trans_brelse(args->trans, bp);
 	}
 	error = 0;
+#else
+rm_shrink:
+	args->dac.dela_state = XFS_DAS_RM_SHRINK;
+	error = xfs_attr_node_removename_shrink();
+	/* -EAGAIN */
+#endif
 
 out:
 	if (state)
Allison Henderson Feb. 24, 2020, 11:14 p.m. UTC | #3
On 2/24/20 8:25 AM, Brian Foster wrote:
> On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
>> This patch modifies the attr remove routines to be delay ready. This means they no
>> longer roll or commit transactions, but instead return -EAGAIN to have the calling
>> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
>> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
>> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
>> been modified to use the switch, and a  new version of xfs_attr_remove_args
>> consists of a simple loop to refresh the transaction until the operation is
>> completed.
>>
>> This patch also adds a new struct xfs_delattr_context, which we will use to keep
>> track of the current state of an attribute operation. The new xfs_delattr_state
>> enum is used to track various operations that are in progress so that we know not
>> to repeat them, and resume where we left off before EAGAIN was returned to cycle
>> out the transaction. Other members take the place of local variables that need
>> to retain their values across multiple function recalls.
>>
>> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
>> indicate places where the function would return -EAGAIN, and then immediately
>> resume from after being recalled by the calling function.  States marked as a
>> "subroutine state" indicate that they belong to a subroutine, and so the calling
>> function needs to pass them back to that subroutine to allow it to finish where
>> it left off. But they otherwise do not have a role in the calling function other
>> than just passing through.
>>
>>   xfs_attr_remove_iter()
>>           XFS_DAS_RM_SHRINK     ─┐
>>           (subroutine state)     │
>>                                  │
>>           XFS_DAS_RMTVAL_REMOVE ─┤
>>           (subroutine state)     │
>>                                  └─>xfs_attr_node_removename()
>>                                                   │
>>                                                   v
>>                                           need to remove
>>                                     ┌─n──  rmt blocks?
>>                                     │             │
>>                                     │             y
>>                                     │             │
>>                                     │             v
>>                                     │  ┌─>XFS_DAS_RMTVAL_REMOVE
>>                                     │  │          │
>>                                     │  │          v
>>                                     │  └──y── more blks
>>                                     │         to remove?
>>                                     │             │
>>                                     │             n
>>                                     │             │
>>                                     │             v
>>                                     │         need to
>>                                     └─────> shrink tree? ─n─┐
>>                                                   │         │
>>                                                   y         │
>>                                                   │         │
>>                                                   v         │
>>                                           XFS_DAS_RM_SHRINK │
>>                                                   │         │
>>                                                   v         │
>>                                                  done <─────┘
>>
> 
> Wow. :P I guess I have nothing against verbose commit logs, but I wonder
> how useful this level of documentation is for a patch that shouldn't
> really change the existing flow of the operation.

Yes Darrick had requested a diagram in the last review, so I had put 
this together.  I wasnt sure where the best place to put it even was, so 
I put it here at least for now.  I have no idea if there is a limit on 
commit message length, but if there is, I'm pretty sure I blew right 
past it in this patch and the next.  Maybe if anything it can just be 
here for now while we work through things?

> 
>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>>   fs/xfs/libxfs/xfs_attr.h     |   1 +
>>   fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>>   fs/xfs/scrub/common.c        |   2 +
>>   fs/xfs/xfs_acl.c             |   2 +
>>   fs/xfs/xfs_attr_list.c       |   1 +
>>   fs/xfs/xfs_ioctl.c           |   2 +
>>   fs/xfs/xfs_ioctl32.c         |   2 +
>>   fs/xfs/xfs_iops.c            |   2 +
>>   fs/xfs/xfs_xattr.c           |   1 +
>>   10 files changed, 141 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 5d73bdf..cd3a3f7 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -368,11 +368,60 @@ xfs_has_attr(
>>    */
>>   int
>>   xfs_attr_remove_args(
>> +	struct xfs_da_args	*args)
>> +{
>> +	int			error = 0;
>> +	int			err2 = 0;
>> +
>> +	do {
>> +		error = xfs_attr_remove_iter(args);
>> +		if (error && error != -EAGAIN)
>> +			goto out;
>> +
> 
> I'm a little confused on the logic of this loop given that the only
> caller commits the transaction (which also finishes dfops). IOW, it
> seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
> that is the case, this can be simplified to something like:
Well, we need to do it when error == -EAGAIN or 0, right? Which I think 
better imitates the defer_finish routines.  That's why a lot of the 
existing code that just finishes off with a transaction just sort of 
gets sawed off at the end. Otherwise they would need one more state just 
to return -EAGAIN as the last thing they have to do. Did that make sense?

> 
> int
> xfs_attr_remove_args(
>          struct xfs_da_args      *args)
> {
>          int                     error;
> 
>          do {
>                  error = xfs_attr_remove_iter(args);
>                  if (error != -EAGAIN)
>                          break;
> 
>                  if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>                          args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>                          error = xfs_defer_finish(&args->trans);
>                          if (error)
>                                  break;
>                  }
> 
>                  error = xfs_trans_roll_inode(&args->trans, args->dp);
>                  if (error)
>                          break;
>          } while (true);
> 
>          return error;
> }
> 
> That has the added benefit of eliminating the whole err2 pattern, which
> always strikes me as a landmine.
> 
>> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> 
> BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
> operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?
Sure, will update

> 
>> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>> +
>> +			err2 = xfs_defer_finish(&args->trans);
>> +			if (err2) {
>> +				error = err2;
>> +				goto out;
>> +			}
>> +		}
>> +
>> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		if (err2) {
>> +			error = err2;
>> +			goto out;
>> +		}
>> +
>> +	} while (error == -EAGAIN);
>> +out:
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove the attribute specified in @args.
>> + *
>> + * This function may return -EAGAIN to signal that the transaction needs to be
>> + * rolled.  Callers should continue calling this function until they receive a
>> + * return value other than -EAGAIN.
>> + */
>> +int
>> +xfs_attr_remove_iter(
>>   	struct xfs_da_args      *args)
>>   {
>>   	struct xfs_inode	*dp = args->dp;
>>   	int			error;
>>   
>> +	/* State machine switch */
>> +	switch (args->dac.dela_state) {
>> +	case XFS_DAS_RM_SHRINK:
>> +	case XFS_DAS_RMTVAL_REMOVE:
>> +		goto node;
>> +	default:
>> +		break;
>> +	}
>> +
>>   	if (!xfs_inode_hasattr(dp)) {
>>   		error = -ENOATTR;
>>   	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
>> @@ -381,6 +430,7 @@ xfs_attr_remove_args(
>>   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>   		error = xfs_attr_leaf_removename(args);
>>   	} else {
>> +node:
>>   		error = xfs_attr_node_removename(args);
>>   	}
>>   
>> @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
>>   		/* bp is gone due to xfs_da_shrink_inode */
>>   		if (error)
>>   			return error;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			return error;
>> +
>> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>>   	}
>>   	return 0;
>>   }
>> @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
>>    * This will involve walking down the Btree, and may involve joining
>>    * leaf nodes and even joining intermediate nodes up to and including
>>    * the root node (a special case of an intermediate node).
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>>   xfs_attr_node_removename(
>> @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
>>   	struct xfs_inode	*dp = args->dp;
>>   
>>   	trace_xfs_attr_node_removename(args);
>> +	state = args->dac.da_state;
>> +	blk = args->dac.blk;
>> +
>> +	/* State machine switch */
>> +	switch (args->dac.dela_state) {
>> +	case XFS_DAS_RMTVAL_REMOVE:
>> +		goto rm_node_blks;
>> +	case XFS_DAS_RM_SHRINK:
>> +		goto rm_shrink;
>> +	default:
>> +		break;
>> +	}
>>   
>>   	error = xfs_attr_node_hasname(args, &state);
>>   	if (error != -EEXIST)
>>   		goto out;
>> +	else
>> +		error = 0;
> 
> This doesn't look necessary.
Well, at this point error has to be -EEXIST.  Which is great because we 
need the attr to exist, but we dont want to return that as error for 
this function.  Which can happen if error is not otherwise set.

> 
>>   
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.
>> @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
>>   	blk = &state->path.blk[ state->path.active-1 ];
>>   	ASSERT(blk->bp != NULL);
>>   	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
>> +
>> +	/*
>> +	 * Store blk and state in the context incase we need to cycle out the
>> +	 * transaction
>> +	 */
>> +	args->dac.blk = blk;
>> +	args->dac.da_state = state;
>> +
>>   	if (args->rmtblkno > 0) {
>>   		/*
>>   		 * Fill in disk block numbers in the state structure
>> @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
>>   		if (error)
>>   			goto out;
>>   
>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		error = xfs_attr_rmtval_invalidate(args);
> 
> Remind me why we lose the above trans roll? I vaguely recall that this
> was intentional, but I could be mistaken...
I think we removed it in v5.  We used to have a  XFS_DAS_RM_INVALIDATE 
state, but then we reasoned that because these are just in-core changes, 
we didnt need it, so we eliminated this state entirely.

Maybe i just add a comment here?  Just as a reminder

> 
>>   		if (error)
>>   			goto out;
>> +	}
>>   
>> -		error = xfs_attr_rmtval_remove(args);
>> -		if (error)
>> -			goto out;
>> +rm_node_blks:
>> +
>> +	if (args->rmtblkno > 0) {
>> +		error = xfs_attr_rmtval_unmap(args);
>> +
>> +		if (error) {
>> +			if (error == -EAGAIN)
>> +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
> 
> Might be helpful for the code labels to match the state names. I.e., use
> das_rmtval_remove: for the label above.
Sure, I can update add the das prefix.

> 
>> +			return error;
>> +		}
>>   
>>   		/*
>>   		 * Refill the state structure with buffers, the prior calls
>> @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
>>   		error = xfs_da3_join(state);
>>   		if (error)
>>   			goto out;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			goto out;
>> -		/*
>> -		 * Commit the Btree join operation and start a new trans.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>> -		if (error)
>> -			goto out;
>> +
>> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>> +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
>> +		return -EAGAIN;
>>   	}
>>   
>> +rm_shrink:
>> +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
>> +
> 
> There's an xfs_defer_finish() call further down this function. Should
> that be replaced with the flag?
> 
> Finally, I mentioned in a previous review that this function should
> probably be further broken down before fitting in the state management
> stuff. It doesn't look like that happened so I've attached a diff that
> is just intended to give an idea of what I mean by sectioning off the
> hunks that might be able to break down into helpers. The helpers
> wouldn't contain any state management, so we create a clear separation
> between the state code and functional components. 
Yes, it's xfs_attr_node_shrink in patch 15.  I moved it to another patch 
to try and keep the activity in this one to a minimum.  Apologies if it 
surprised you!  And then i mistakenly had taken the XFS_DAC_FINISH_TRANS 
flag with it.  I meant to keep all the state machine stuff here.  Will fix!

I think this initial
> refactoring would make the introduction of state much more simple 

I guess I didn't think people would be partial to introducing helpers 
before or after the state logic.  I put them after in this set because 
the states are visible now, so I though it would make the goal of 
modularizing code between the states more clear to folks.  Do you think 
I should move it back behind the state machine patches?

(and
> perhaps alleviate the need for the huge diagram). 
Well, I get the impression that people find the series sort of scary and 
maybe the diagrams help them a bit.  Maybe we can take them out later 
after people feel like they are comfortable with things?

It might also be
> interesting to see how much of the result could be folded up further
> into _removename_iter()...

Yes, I think that is the goal we're reaching for.  I will add the other 
helpers I see in your diff too.

Thanks for the reviews!
Allison

> 
> Brian
> 
>>   	/*
>>   	 * If the result is small enough, push it all into the inode.
>>   	 */
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index ce7b039..ea873a5 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>> +int xfs_attr_remove_iter(struct xfs_da_args *args);
>>   int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
>>   		  int flags, struct attrlist_cursor_kern *cursor);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
>> index 14f1be3..3c78498 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.h
>> +++ b/fs/xfs/libxfs/xfs_da_btree.h
>> @@ -50,9 +50,39 @@ enum xfs_dacmp {
>>   };
>>   
>>   /*
>> + * Enum values for xfs_delattr_context.da_state
>> + *
>> + * These values are used by delayed attribute operations to keep track  of where
>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>> + * calling function to roll the transaction, and then recall the subroutine to
>> + * finish the operation.  The enum is then used by the subroutine to jump back
>> + * to where it was and resume executing where it left off.
>> + */
>> +enum xfs_delattr_state {
>> +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
>> +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
>> +};
>> +
>> +/*
>> + * Defines for xfs_delattr_context.flags
>> + */
>> +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
>> +
>> +/*
>> + * Context used for keeping track of delayed attribute operations
>> + */
>> +struct xfs_delattr_context {
>> +	struct xfs_da_state	*da_state;
>> +	struct xfs_da_state_blk *blk;
>> +	unsigned int		flags;
>> +	enum xfs_delattr_state	dela_state;
>> +};
>> +
>> +/*
>>    * Structure to ease passing around component names.
>>    */
>>   typedef struct xfs_da_args {
>> +	struct xfs_delattr_context dac; /* context used for delay attr ops */
>>   	struct xfs_da_geometry *geo;	/* da block geometry */
>>   	struct xfs_name	name;		/* name, length and argument  flags*/
>>   	uint8_t		filetype;	/* filetype of inode for directories */
>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>> index 1887605..9a649d1 100644
>> --- a/fs/xfs/scrub/common.c
>> +++ b/fs/xfs/scrub/common.c
>> @@ -24,6 +24,8 @@
>>   #include "xfs_rmap_btree.h"
>>   #include "xfs_log.h"
>>   #include "xfs_trans_priv.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_reflink.h"
>>   #include "scrub/scrub.h"
>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>> index 42ac847..d65e6d8 100644
>> --- a/fs/xfs/xfs_acl.c
>> +++ b/fs/xfs/xfs_acl.c
>> @@ -10,6 +10,8 @@
>>   #include "xfs_trans_resv.h"
>>   #include "xfs_mount.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_trace.h"
>>   #include "xfs_error.h"
>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>> index d37743b..881b9a4 100644
>> --- a/fs/xfs/xfs_attr_list.c
>> +++ b/fs/xfs/xfs_attr_list.c
>> @@ -12,6 +12,7 @@
>>   #include "xfs_trans_resv.h"
>>   #include "xfs_mount.h"
>>   #include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_inode.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index 28c07c9..7c1d9da 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -15,6 +15,8 @@
>>   #include "xfs_iwalk.h"
>>   #include "xfs_itable.h"
>>   #include "xfs_error.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_bmap.h"
>>   #include "xfs_bmap_util.h"
>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>> index 769581a..d504f8f 100644
>> --- a/fs/xfs/xfs_ioctl32.c
>> +++ b/fs/xfs/xfs_ioctl32.c
>> @@ -17,6 +17,8 @@
>>   #include "xfs_itable.h"
>>   #include "xfs_fsops.h"
>>   #include "xfs_rtalloc.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_ioctl.h"
>>   #include "xfs_ioctl32.h"
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index e85bbf5..a2d299f 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -13,6 +13,8 @@
>>   #include "xfs_inode.h"
>>   #include "xfs_acl.h"
>>   #include "xfs_quota.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>> index 74133a5..d8dc72d 100644
>> --- a/fs/xfs/xfs_xattr.c
>> +++ b/fs/xfs/xfs_xattr.c
>> @@ -10,6 +10,7 @@
>>   #include "xfs_log_format.h"
>>   #include "xfs_da_format.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_acl.h"
>>   
>> -- 
>> 2.7.4
>>
>
Darrick J. Wong Feb. 24, 2020, 11:56 p.m. UTC | #4
On Mon, Feb 24, 2020 at 04:14:48PM -0700, Allison Collins wrote:
> On 2/24/20 8:25 AM, Brian Foster wrote:
> > On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
> > > This patch modifies the attr remove routines to be delay ready. This means they no
> > > longer roll or commit transactions, but instead return -EAGAIN to have the calling
> > > routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
> > > become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
> > > track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
> > > been modified to use the switch, and a  new version of xfs_attr_remove_args
> > > consists of a simple loop to refresh the transaction until the operation is
> > > completed.
> > > 
> > > This patch also adds a new struct xfs_delattr_context, which we will use to keep
> > > track of the current state of an attribute operation. The new xfs_delattr_state
> > > enum is used to track various operations that are in progress so that we know not
> > > to repeat them, and resume where we left off before EAGAIN was returned to cycle
> > > out the transaction. Other members take the place of local variables that need
> > > to retain their values across multiple function recalls.
> > > 
> > > Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
> > > indicate places where the function would return -EAGAIN, and then immediately
> > > resume from after being recalled by the calling function.  States marked as a
> > > "subroutine state" indicate that they belong to a subroutine, and so the calling
> > > function needs to pass them back to that subroutine to allow it to finish where
> > > it left off. But they otherwise do not have a role in the calling function other
> > > than just passing through.
> > > 
> > >   xfs_attr_remove_iter()
> > >           XFS_DAS_RM_SHRINK     ─┐
> > >           (subroutine state)     │
> > >                                  │
> > >           XFS_DAS_RMTVAL_REMOVE ─┤
> > >           (subroutine state)     │
> > >                                  └─>xfs_attr_node_removename()
> > >                                                   │
> > >                                                   v
> > >                                           need to remove
> > >                                     ┌─n──  rmt blocks?
> > >                                     │             │
> > >                                     │             y
> > >                                     │             │
> > >                                     │             v
> > >                                     │  ┌─>XFS_DAS_RMTVAL_REMOVE
> > >                                     │  │          │
> > >                                     │  │          v
> > >                                     │  └──y── more blks
> > >                                     │         to remove?
> > >                                     │             │
> > >                                     │             n
> > >                                     │             │
> > >                                     │             v
> > >                                     │         need to
> > >                                     └─────> shrink tree? ─n─┐
> > >                                                   │         │
> > >                                                   y         │
> > >                                                   │         │
> > >                                                   v         │
> > >                                           XFS_DAS_RM_SHRINK │
> > >                                                   │         │
> > >                                                   v         │
> > >                                                  done <─────┘
> > > 
> > 
> > Wow. :P I guess I have nothing against verbose commit logs, but I wonder
> > how useful this level of documentation is for a patch that shouldn't
> > really change the existing flow of the operation.
> 
> Yes Darrick had requested a diagram in the last review, so I had put this
> together.  I wasnt sure where the best place to put it even was, so I put it
> here at least for now.  I have no idea if there is a limit on commit message
> length, but if there is, I'm pretty sure I blew right past it in this patch
> and the next.  Maybe if anything it can just be here for now while we work
> through things?

There is no limit, as far as I'm concerned, and it's worthwhile if it
will make it easy to trace through the old attr code, the new
restartable attr code, and (eventually) the attr intent item code to
make sure that nothing fell out by accident.

--D

> > 
> > > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
> > >   fs/xfs/libxfs/xfs_attr.h     |   1 +
> > >   fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
> > >   fs/xfs/scrub/common.c        |   2 +
> > >   fs/xfs/xfs_acl.c             |   2 +
> > >   fs/xfs/xfs_attr_list.c       |   1 +
> > >   fs/xfs/xfs_ioctl.c           |   2 +
> > >   fs/xfs/xfs_ioctl32.c         |   2 +
> > >   fs/xfs/xfs_iops.c            |   2 +
> > >   fs/xfs/xfs_xattr.c           |   1 +
> > >   10 files changed, 141 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index 5d73bdf..cd3a3f7 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -368,11 +368,60 @@ xfs_has_attr(
> > >    */
> > >   int
> > >   xfs_attr_remove_args(
> > > +	struct xfs_da_args	*args)
> > > +{
> > > +	int			error = 0;
> > > +	int			err2 = 0;
> > > +
> > > +	do {
> > > +		error = xfs_attr_remove_iter(args);
> > > +		if (error && error != -EAGAIN)
> > > +			goto out;
> > > +
> > 
> > I'm a little confused on the logic of this loop given that the only
> > caller commits the transaction (which also finishes dfops). IOW, it
> > seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
> > that is the case, this can be simplified to something like:
> Well, we need to do it when error == -EAGAIN or 0, right? Which I think
> better imitates the defer_finish routines.  That's why a lot of the existing
> code that just finishes off with a transaction just sort of gets sawed off
> at the end. Otherwise they would need one more state just to return -EAGAIN
> as the last thing they have to do. Did that make sense?
> 
> > 
> > int
> > xfs_attr_remove_args(
> >          struct xfs_da_args      *args)
> > {
> >          int                     error;
> > 
> >          do {
> >                  error = xfs_attr_remove_iter(args);
> >                  if (error != -EAGAIN)
> >                          break;
> > 
> >                  if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> >                          args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> >                          error = xfs_defer_finish(&args->trans);
> >                          if (error)
> >                                  break;
> >                  }
> > 
> >                  error = xfs_trans_roll_inode(&args->trans, args->dp);
> >                  if (error)
> >                          break;
> >          } while (true);
> > 
> >          return error;
> > }
> > 
> > That has the added benefit of eliminating the whole err2 pattern, which
> > always strikes me as a landmine.
> > 
> > > +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> > 
> > BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
> > operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?
> Sure, will update
> 
> > 
> > > +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> > > +
> > > +			err2 = xfs_defer_finish(&args->trans);
> > > +			if (err2) {
> > > +				error = err2;
> > > +				goto out;
> > > +			}
> > > +		}
> > > +
> > > +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
> > > +		if (err2) {
> > > +			error = err2;
> > > +			goto out;
> > > +		}
> > > +
> > > +	} while (error == -EAGAIN);
> > > +out:
> > > +	return error;
> > > +}
> > > +
> > > +/*
> > > + * Remove the attribute specified in @args.
> > > + *
> > > + * This function may return -EAGAIN to signal that the transaction needs to be
> > > + * rolled.  Callers should continue calling this function until they receive a
> > > + * return value other than -EAGAIN.
> > > + */
> > > +int
> > > +xfs_attr_remove_iter(
> > >   	struct xfs_da_args      *args)
> > >   {
> > >   	struct xfs_inode	*dp = args->dp;
> > >   	int			error;
> > > +	/* State machine switch */
> > > +	switch (args->dac.dela_state) {
> > > +	case XFS_DAS_RM_SHRINK:
> > > +	case XFS_DAS_RMTVAL_REMOVE:
> > > +		goto node;
> > > +	default:
> > > +		break;
> > > +	}
> > > +
> > >   	if (!xfs_inode_hasattr(dp)) {
> > >   		error = -ENOATTR;
> > >   	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > > @@ -381,6 +430,7 @@ xfs_attr_remove_args(
> > >   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > >   		error = xfs_attr_leaf_removename(args);
> > >   	} else {
> > > +node:
> > >   		error = xfs_attr_node_removename(args);
> > >   	}
> > > @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
> > >   		/* bp is gone due to xfs_da_shrink_inode */
> > >   		if (error)
> > >   			return error;
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > -			return error;
> > > +
> > > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> > >   	}
> > >   	return 0;
> > >   }
> > > @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
> > >    * This will involve walking down the Btree, and may involve joining
> > >    * leaf nodes and even joining intermediate nodes up to and including
> > >    * the root node (a special case of an intermediate node).
> > > + *
> > > + * This routine is meant to function as either an inline or delayed operation,
> > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > + * functions will need to handle this, and recall the function until a
> > > + * successful error code is returned.
> > >    */
> > >   STATIC int
> > >   xfs_attr_node_removename(
> > > @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
> > >   	struct xfs_inode	*dp = args->dp;
> > >   	trace_xfs_attr_node_removename(args);
> > > +	state = args->dac.da_state;
> > > +	blk = args->dac.blk;
> > > +
> > > +	/* State machine switch */
> > > +	switch (args->dac.dela_state) {
> > > +	case XFS_DAS_RMTVAL_REMOVE:
> > > +		goto rm_node_blks;
> > > +	case XFS_DAS_RM_SHRINK:
> > > +		goto rm_shrink;
> > > +	default:
> > > +		break;
> > > +	}
> > >   	error = xfs_attr_node_hasname(args, &state);
> > >   	if (error != -EEXIST)
> > >   		goto out;
> > > +	else
> > > +		error = 0;
> > 
> > This doesn't look necessary.
> Well, at this point error has to be -EEXIST.  Which is great because we need
> the attr to exist, but we dont want to return that as error for this
> function.  Which can happen if error is not otherwise set.
> 
> > 
> > >   	/*
> > >   	 * If there is an out-of-line value, de-allocate the blocks.
> > > @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
> > >   	blk = &state->path.blk[ state->path.active-1 ];
> > >   	ASSERT(blk->bp != NULL);
> > >   	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> > > +
> > > +	/*
> > > +	 * Store blk and state in the context incase we need to cycle out the
> > > +	 * transaction
> > > +	 */
> > > +	args->dac.blk = blk;
> > > +	args->dac.da_state = state;
> > > +
> > >   	if (args->rmtblkno > 0) {
> > >   		/*
> > >   		 * Fill in disk block numbers in the state structure
> > > @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
> > >   		if (error)
> > >   			goto out;
> > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > +		error = xfs_attr_rmtval_invalidate(args);
> > 
> > Remind me why we lose the above trans roll? I vaguely recall that this
> > was intentional, but I could be mistaken...
> I think we removed it in v5.  We used to have a  XFS_DAS_RM_INVALIDATE
> state, but then we reasoned that because these are just in-core changes, we
> didnt need it, so we eliminated this state entirely.
> 
> Maybe i just add a comment here?  Just as a reminder
> 
> > 
> > >   		if (error)
> > >   			goto out;
> > > +	}
> > > -		error = xfs_attr_rmtval_remove(args);
> > > -		if (error)
> > > -			goto out;
> > > +rm_node_blks:
> > > +
> > > +	if (args->rmtblkno > 0) {
> > > +		error = xfs_attr_rmtval_unmap(args);
> > > +
> > > +		if (error) {
> > > +			if (error == -EAGAIN)
> > > +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
> > 
> > Might be helpful for the code labels to match the state names. I.e., use
> > das_rmtval_remove: for the label above.
> Sure, I can update add the das prefix.
> 
> > 
> > > +			return error;
> > > +		}
> > >   		/*
> > >   		 * Refill the state structure with buffers, the prior calls
> > > @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
> > >   		error = xfs_da3_join(state);
> > >   		if (error)
> > >   			goto out;
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > -			goto out;
> > > -		/*
> > > -		 * Commit the Btree join operation and start a new trans.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > -		if (error)
> > > -			goto out;
> > > +
> > > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> > > +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > > +		return -EAGAIN;
> > >   	}
> > > +rm_shrink:
> > > +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > > +
> > 
> > There's an xfs_defer_finish() call further down this function. Should
> > that be replaced with the flag?
> > 
> > Finally, I mentioned in a previous review that this function should
> > probably be further broken down before fitting in the state management
> > stuff. It doesn't look like that happened so I've attached a diff that
> > is just intended to give an idea of what I mean by sectioning off the
> > hunks that might be able to break down into helpers. The helpers
> > wouldn't contain any state management, so we create a clear separation
> > between the state code and functional components.
> Yes, it's xfs_attr_node_shrink in patch 15.  I moved it to another patch to
> try and keep the activity in this one to a minimum.  Apologies if it
> surprised you!  And then i mistakenly had taken the XFS_DAC_FINISH_TRANS
> flag with it.  I meant to keep all the state machine stuff here.  Will fix!
> 
> I think this initial
> > refactoring would make the introduction of state much more simple
> 
> I guess I didn't think people would be partial to introducing helpers before
> or after the state logic.  I put them after in this set because the states
> are visible now, so I though it would make the goal of modularizing code
> between the states more clear to folks.  Do you think I should move it back
> behind the state machine patches?
> 
> (and
> > perhaps alleviate the need for the huge diagram).
> Well, I get the impression that people find the series sort of scary and
> maybe the diagrams help them a bit.  Maybe we can take them out later after
> people feel like they are comfortable with things?
> 
> It might also be
> > interesting to see how much of the result could be folded up further
> > into _removename_iter()...
> 
> Yes, I think that is the goal we're reaching for.  I will add the other
> helpers I see in your diff too.
> 
> Thanks for the reviews!
> Allison
> 
> > 
> > Brian
> > 
> > >   	/*
> > >   	 * If the result is small enough, push it all into the inode.
> > >   	 */
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index ce7b039..ea873a5 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
> > >   int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
> > >   int xfs_has_attr(struct xfs_da_args *args);
> > >   int xfs_attr_remove_args(struct xfs_da_args *args);
> > > +int xfs_attr_remove_iter(struct xfs_da_args *args);
> > >   int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
> > >   		  int flags, struct attrlist_cursor_kern *cursor);
> > >   bool xfs_attr_namecheck(const void *name, size_t length);
> > > diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> > > index 14f1be3..3c78498 100644
> > > --- a/fs/xfs/libxfs/xfs_da_btree.h
> > > +++ b/fs/xfs/libxfs/xfs_da_btree.h
> > > @@ -50,9 +50,39 @@ enum xfs_dacmp {
> > >   };
> > >   /*
> > > + * Enum values for xfs_delattr_context.da_state
> > > + *
> > > + * These values are used by delayed attribute operations to keep track  of where
> > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > + * calling function to roll the transaction, and then recall the subroutine to
> > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > + * to where it was and resume executing where it left off.
> > > + */
> > > +enum xfs_delattr_state {
> > > +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
> > > +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
> > > +};
> > > +
> > > +/*
> > > + * Defines for xfs_delattr_context.flags
> > > + */
> > > +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
> > > +
> > > +/*
> > > + * Context used for keeping track of delayed attribute operations
> > > + */
> > > +struct xfs_delattr_context {
> > > +	struct xfs_da_state	*da_state;
> > > +	struct xfs_da_state_blk *blk;
> > > +	unsigned int		flags;
> > > +	enum xfs_delattr_state	dela_state;
> > > +};
> > > +
> > > +/*
> > >    * Structure to ease passing around component names.
> > >    */
> > >   typedef struct xfs_da_args {
> > > +	struct xfs_delattr_context dac; /* context used for delay attr ops */
> > >   	struct xfs_da_geometry *geo;	/* da block geometry */
> > >   	struct xfs_name	name;		/* name, length and argument  flags*/
> > >   	uint8_t		filetype;	/* filetype of inode for directories */
> > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > index 1887605..9a649d1 100644
> > > --- a/fs/xfs/scrub/common.c
> > > +++ b/fs/xfs/scrub/common.c
> > > @@ -24,6 +24,8 @@
> > >   #include "xfs_rmap_btree.h"
> > >   #include "xfs_log.h"
> > >   #include "xfs_trans_priv.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_reflink.h"
> > >   #include "scrub/scrub.h"
> > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > > index 42ac847..d65e6d8 100644
> > > --- a/fs/xfs/xfs_acl.c
> > > +++ b/fs/xfs/xfs_acl.c
> > > @@ -10,6 +10,8 @@
> > >   #include "xfs_trans_resv.h"
> > >   #include "xfs_mount.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_trace.h"
> > >   #include "xfs_error.h"
> > > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > > index d37743b..881b9a4 100644
> > > --- a/fs/xfs/xfs_attr_list.c
> > > +++ b/fs/xfs/xfs_attr_list.c
> > > @@ -12,6 +12,7 @@
> > >   #include "xfs_trans_resv.h"
> > >   #include "xfs_mount.h"
> > >   #include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_inode.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_bmap.h"
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index 28c07c9..7c1d9da 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -15,6 +15,8 @@
> > >   #include "xfs_iwalk.h"
> > >   #include "xfs_itable.h"
> > >   #include "xfs_error.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_bmap.h"
> > >   #include "xfs_bmap_util.h"
> > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > index 769581a..d504f8f 100644
> > > --- a/fs/xfs/xfs_ioctl32.c
> > > +++ b/fs/xfs/xfs_ioctl32.c
> > > @@ -17,6 +17,8 @@
> > >   #include "xfs_itable.h"
> > >   #include "xfs_fsops.h"
> > >   #include "xfs_rtalloc.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_ioctl.h"
> > >   #include "xfs_ioctl32.h"
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index e85bbf5..a2d299f 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -13,6 +13,8 @@
> > >   #include "xfs_inode.h"
> > >   #include "xfs_acl.h"
> > >   #include "xfs_quota.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_trace.h"
> > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > > index 74133a5..d8dc72d 100644
> > > --- a/fs/xfs/xfs_xattr.c
> > > +++ b/fs/xfs/xfs_xattr.c
> > > @@ -10,6 +10,7 @@
> > >   #include "xfs_log_format.h"
> > >   #include "xfs_da_format.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_acl.h"
> > > -- 
> > > 2.7.4
> > > 
> >
Dave Chinner Feb. 25, 2020, 8:57 a.m. UTC | #5
On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
> This patch modifies the attr remove routines to be delay ready. This means they no
> longer roll or commit transactions, but instead return -EAGAIN to have the calling
> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
> been modified to use the switch, and a  new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation is
> completed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use to keep
> track of the current state of an attribute operation. The new xfs_delattr_state
> enum is used to track various operations that are in progress so that we know not
> to repeat them, and resume where we left off before EAGAIN was returned to cycle
> out the transaction. Other members take the place of local variables that need
> to retain their values across multiple function recalls.
> 
> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states

Ok, so I find all the DA/da prefixes in this code confusing,
especially as they have very similar actual names. e.g. da_state
vs delattr_state, DAS vs DA_STATE, etc.

Basically, I can't tell from reading the code what "DA" the actual
variable belongs to, and in a few months time I'll most definitely
have forgotten and have to relearn it from scratch.

So while "Delayed Attributes" is a great name for the feature, I
don't think it makes a great acronym for shortening variable names
because of the conflict with the existing DA namespace prefix.

Also, "dac" as shorthand for delattr context is also overloaded.
"DAC" is "discretionary access control" and is quite widely used
in the kernel (e.g. CAP_DAC_READ_SEARCH, CAP_DAC_OVERRIDE) so again
I read thsi code and it doesn't make much sense.

I haven't come up with a better name - "attribute iterator" is the
best I've managed (marketing++ - XFS has AI now!) and shortening it
down to ator would go a long way to alleviating my namespace
confusion....

> indicate places where the function would return -EAGAIN, and then immediately
> resume from after being recalled by the calling function.  States marked as a
> "subroutine state" indicate that they belong to a subroutine, and so the calling
> function needs to pass them back to that subroutine to allow it to finish where
> it left off. But they otherwise do not have a role in the calling function other
> than just passing through.
> 
>  xfs_attr_remove_iter()
>          XFS_DAS_RM_SHRINK     ─┐
>          (subroutine state)     │
>                                 │
>          XFS_DAS_RMTVAL_REMOVE ─┤
>          (subroutine state)     │
>                                 └─>xfs_attr_node_removename()
>                                                  │
>                                                  v
>                                          need to remove
>                                    ┌─n──  rmt blocks?
>                                    │             │
>                                    │             y
>                                    │             │
>                                    │             v
>                                    │  ┌─>XFS_DAS_RMTVAL_REMOVE
>                                    │  │          │
>                                    │  │          v
>                                    │  └──y── more blks
>                                    │         to remove?
>                                    │             │
>                                    │             n
>                                    │             │
>                                    │             v
>                                    │         need to
>                                    └─────> shrink tree? ─n─┐
>                                                  │         │
>                                                  y         │
>                                                  │         │
>                                                  v         │
>                                          XFS_DAS_RM_SHRINK │
>                                                  │         │
>                                                  v         │
>                                                 done <─────┘

Nice.

> 
> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>  fs/xfs/libxfs/xfs_attr.h     |   1 +
>  fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>  fs/xfs/scrub/common.c        |   2 +
>  fs/xfs/xfs_acl.c             |   2 +
>  fs/xfs/xfs_attr_list.c       |   1 +
>  fs/xfs/xfs_ioctl.c           |   2 +
>  fs/xfs/xfs_ioctl32.c         |   2 +
>  fs/xfs/xfs_iops.c            |   2 +
>  fs/xfs/xfs_xattr.c           |   1 +
>  10 files changed, 141 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 5d73bdf..cd3a3f7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -368,11 +368,60 @@ xfs_has_attr(
>   */
>  int
>  xfs_attr_remove_args(
> +	struct xfs_da_args	*args)
> +{
> +	int			error = 0;
> +	int			err2 = 0;
> +
> +	do {
> +		error = xfs_attr_remove_iter(args);
> +		if (error && error != -EAGAIN)
> +			goto out;
> +
> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> +
> +			err2 = xfs_defer_finish(&args->trans);
> +			if (err2) {
> +				error = err2;
> +				goto out;
> +			}
> +		}
> +
> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
> +		if (err2) {
> +			error = err2;
> +			goto out;
> +		}
> +
> +	} while (error == -EAGAIN);
> +out:
> +	return error;
> +}

Brian commented on the structure of this loop better than I could.

> +
> +/*
> + * Remove the attribute specified in @args.
> + *
> + * This function may return -EAGAIN to signal that the transaction needs to be
> + * rolled.  Callers should continue calling this function until they receive a
> + * return value other than -EAGAIN.
> + */
> +int
> +xfs_attr_remove_iter(
>  	struct xfs_da_args      *args)
>  {
>  	struct xfs_inode	*dp = args->dp;
>  	int			error;
>  
> +	/* State machine switch */
> +	switch (args->dac.dela_state) {
> +	case XFS_DAS_RM_SHRINK:
> +	case XFS_DAS_RMTVAL_REMOVE:
> +		goto node;
> +	default:
> +		break;
> +	}

Why separate out the state machine? Doesn't this shortcut the
xfs_inode_hasattr() check? Shouldn't that come first?

As it is:

	case XFS_DAS_RM_SHRINK:
	case XFS_DAS_RMTVAL_REMOVE:
		return xfs_attr_node_removename(args);
	default:
		break;

would be nicer, and if this is the only way we can get to
xfs_attr_node_removename(), getting rid of it from the code
below could be done, too.


> +
>  	if (!xfs_inode_hasattr(dp)) {
>  		error = -ENOATTR;
>  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> @@ -381,6 +430,7 @@ xfs_attr_remove_args(
>  	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>  		error = xfs_attr_leaf_removename(args);
>  	} else {
> +node:
>  		error = xfs_attr_node_removename(args);
>  	}
>  
> @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
>  		/* bp is gone due to xfs_da_shrink_inode */
>  		if (error)
>  			return error;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			return error;
> +
> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>  	}
>  	return 0;
>  }
> @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
>   * This will involve walking down the Btree, and may involve joining
>   * leaf nodes and even joining intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
>  xfs_attr_node_removename(
> @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
>  	struct xfs_inode	*dp = args->dp;
>  
>  	trace_xfs_attr_node_removename(args);
> +	state = args->dac.da_state;
> +	blk = args->dac.blk;
> +
> +	/* State machine switch */
> +	switch (args->dac.dela_state) {
> +	case XFS_DAS_RMTVAL_REMOVE:
> +		goto rm_node_blks;
> +	case XFS_DAS_RM_SHRINK:
> +		goto rm_shrink;
> +	default:
> +		break;
> +	}

This really is calling out for this function to be broken into three
smaller functions. That would greatly simplify the code flow and
logic here.

> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index ce7b039..ea873a5 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
> +int xfs_attr_remove_iter(struct xfs_da_args *args);
>  int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
>  		  int flags, struct attrlist_cursor_kern *cursor);
>  bool xfs_attr_namecheck(const void *name, size_t length);
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> index 14f1be3..3c78498 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -50,9 +50,39 @@ enum xfs_dacmp {
>  };
>  
>  /*
> + * Enum values for xfs_delattr_context.da_state
> + *
> + * These values are used by delayed attribute operations to keep track  of where
> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> + * calling function to roll the transaction, and then recall the subroutine to
> + * finish the operation.  The enum is then used by the subroutine to jump back
> + * to where it was and resume executing where it left off.
> + */
> +enum xfs_delattr_state {
> +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
> +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
> +};
> +
> +/*
> + * Defines for xfs_delattr_context.flags
> + */
> +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
> +
> +/*
> + * Context used for keeping track of delayed attribute operations
> + */
> +struct xfs_delattr_context {
> +	struct xfs_da_state	*da_state;
> +	struct xfs_da_state_blk *blk;
> +	unsigned int		flags;
> +	enum xfs_delattr_state	dela_state;
> +};
> +
> +/*
>   * Structure to ease passing around component names.
>   */
>  typedef struct xfs_da_args {
> +	struct xfs_delattr_context dac; /* context used for delay attr ops */

Probably should put this at the end of the structure rather than the
front.

I'm also wondering if it should be kept separate to the da_args and
contain a pointer to the da_args instead of being wrapped inside
them.

i.e. we put the iterating state structure on the stack, then

	struct attr_iter	ater = {
		.da_args = args,
	};

	do {
		error = xfs_attr_remove_iter(&ater);
		.....
	
And that largely separates the delayed attribute iteration state
from the da_args that holds the internal attribute manipulation
information.

>  	struct xfs_da_geometry *geo;	/* da block geometry */
>  	struct xfs_name	name;		/* name, length and argument  flags*/
>  	uint8_t		filetype;	/* filetype of inode for directories */
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 1887605..9a649d1 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -24,6 +24,8 @@
>  #include "xfs_rmap_btree.h"
>  #include "xfs_log.h"
>  #include "xfs_trans_priv.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_reflink.h"
>  #include "scrub/scrub.h"

Hmmm - why are these new includes necessary? You didn't add anything
new to these files or common header files to make the includes
needed....

Cheers,

Dave.
Brian Foster Feb. 25, 2020, 1:34 p.m. UTC | #6
On Mon, Feb 24, 2020 at 04:14:48PM -0700, Allison Collins wrote:
> On 2/24/20 8:25 AM, Brian Foster wrote:
> > On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
> > > This patch modifies the attr remove routines to be delay ready. This means they no
> > > longer roll or commit transactions, but instead return -EAGAIN to have the calling
> > > routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
> > > become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
> > > track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
> > > been modified to use the switch, and a  new version of xfs_attr_remove_args
> > > consists of a simple loop to refresh the transaction until the operation is
> > > completed.
> > > 
> > > This patch also adds a new struct xfs_delattr_context, which we will use to keep
> > > track of the current state of an attribute operation. The new xfs_delattr_state
> > > enum is used to track various operations that are in progress so that we know not
> > > to repeat them, and resume where we left off before EAGAIN was returned to cycle
> > > out the transaction. Other members take the place of local variables that need
> > > to retain their values across multiple function recalls.
> > > 
> > > Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
> > > indicate places where the function would return -EAGAIN, and then immediately
> > > resume from after being recalled by the calling function.  States marked as a
> > > "subroutine state" indicate that they belong to a subroutine, and so the calling
> > > function needs to pass them back to that subroutine to allow it to finish where
> > > it left off. But they otherwise do not have a role in the calling function other
> > > than just passing through.
> > > 
> > >   xfs_attr_remove_iter()
> > >           XFS_DAS_RM_SHRINK     ─┐
> > >           (subroutine state)     │
> > >                                  │
> > >           XFS_DAS_RMTVAL_REMOVE ─┤
> > >           (subroutine state)     │
> > >                                  └─>xfs_attr_node_removename()
> > >                                                   │
> > >                                                   v
> > >                                           need to remove
> > >                                     ┌─n──  rmt blocks?
> > >                                     │             │
> > >                                     │             y
> > >                                     │             │
> > >                                     │             v
> > >                                     │  ┌─>XFS_DAS_RMTVAL_REMOVE
> > >                                     │  │          │
> > >                                     │  │          v
> > >                                     │  └──y── more blks
> > >                                     │         to remove?
> > >                                     │             │
> > >                                     │             n
> > >                                     │             │
> > >                                     │             v
> > >                                     │         need to
> > >                                     └─────> shrink tree? ─n─┐
> > >                                                   │         │
> > >                                                   y         │
> > >                                                   │         │
> > >                                                   v         │
> > >                                           XFS_DAS_RM_SHRINK │
> > >                                                   │         │
> > >                                                   v         │
> > >                                                  done <─────┘
> > > 
> > 
> > Wow. :P I guess I have nothing against verbose commit logs, but I wonder
> > how useful this level of documentation is for a patch that shouldn't
> > really change the existing flow of the operation.
> 
> Yes Darrick had requested a diagram in the last review, so I had put this
> together.  I wasnt sure where the best place to put it even was, so I put it
> here at least for now.  I have no idea if there is a limit on commit message
> length, but if there is, I'm pretty sure I blew right past it in this patch
> and the next.  Maybe if anything it can just be here for now while we work
> through things?
> 

No problem.. if it's useful it's good to have a record of out around
somewhere until the end result is more stabilized and we can determine
whether this warrants a permanent home somewhere in the code.

> > 
> > > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
> > >   fs/xfs/libxfs/xfs_attr.h     |   1 +
> > >   fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
> > >   fs/xfs/scrub/common.c        |   2 +
> > >   fs/xfs/xfs_acl.c             |   2 +
> > >   fs/xfs/xfs_attr_list.c       |   1 +
> > >   fs/xfs/xfs_ioctl.c           |   2 +
> > >   fs/xfs/xfs_ioctl32.c         |   2 +
> > >   fs/xfs/xfs_iops.c            |   2 +
> > >   fs/xfs/xfs_xattr.c           |   1 +
> > >   10 files changed, 141 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index 5d73bdf..cd3a3f7 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -368,11 +368,60 @@ xfs_has_attr(
> > >    */
> > >   int
> > >   xfs_attr_remove_args(
> > > +	struct xfs_da_args	*args)
> > > +{
> > > +	int			error = 0;
> > > +	int			err2 = 0;
> > > +
> > > +	do {
> > > +		error = xfs_attr_remove_iter(args);
> > > +		if (error && error != -EAGAIN)
> > > +			goto out;
> > > +
> > 
> > I'm a little confused on the logic of this loop given that the only
> > caller commits the transaction (which also finishes dfops). IOW, it
> > seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
> > that is the case, this can be simplified to something like:
> Well, we need to do it when error == -EAGAIN or 0, right? Which I think
> better imitates the defer_finish routines.  That's why a lot of the existing
> code that just finishes off with a transaction just sort of gets sawed off
> at the end. Otherwise they would need one more state just to return -EAGAIN
> as the last thing they have to do. Did that make sense?
> 

Hmm.. I could just be missing something or not far along enough in the
series. Can you point me at an example of where we need to finish/roll
before the caller of xfs_attr_remove_args() commits the transaction?

> > 
> > int
> > xfs_attr_remove_args(
> >          struct xfs_da_args      *args)
> > {
> >          int                     error;
> > 
> >          do {
> >                  error = xfs_attr_remove_iter(args);
> >                  if (error != -EAGAIN)
> >                          break;
> > 
> >                  if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> >                          args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> >                          error = xfs_defer_finish(&args->trans);
> >                          if (error)
> >                                  break;
> >                  }
> > 
> >                  error = xfs_trans_roll_inode(&args->trans, args->dp);
> >                  if (error)
> >                          break;
> >          } while (true);
> > 
> >          return error;
> > }
> > 
> > That has the added benefit of eliminating the whole err2 pattern, which
> > always strikes me as a landmine.
> > 
> > > +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> > 
> > BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
> > operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?
> Sure, will update
> 
> > 
> > > +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> > > +
> > > +			err2 = xfs_defer_finish(&args->trans);
> > > +			if (err2) {
> > > +				error = err2;
> > > +				goto out;
> > > +			}
> > > +		}
> > > +
> > > +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
> > > +		if (err2) {
> > > +			error = err2;
> > > +			goto out;
> > > +		}
> > > +
> > > +	} while (error == -EAGAIN);
> > > +out:
> > > +	return error;
> > > +}
> > > +
> > > +/*
> > > + * Remove the attribute specified in @args.
> > > + *
> > > + * This function may return -EAGAIN to signal that the transaction needs to be
> > > + * rolled.  Callers should continue calling this function until they receive a
> > > + * return value other than -EAGAIN.
> > > + */
> > > +int
> > > +xfs_attr_remove_iter(
> > >   	struct xfs_da_args      *args)
> > >   {
> > >   	struct xfs_inode	*dp = args->dp;
> > >   	int			error;
> > > +	/* State machine switch */
> > > +	switch (args->dac.dela_state) {
> > > +	case XFS_DAS_RM_SHRINK:
> > > +	case XFS_DAS_RMTVAL_REMOVE:
> > > +		goto node;
> > > +	default:
> > > +		break;
> > > +	}
> > > +
> > >   	if (!xfs_inode_hasattr(dp)) {
> > >   		error = -ENOATTR;
> > >   	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > > @@ -381,6 +430,7 @@ xfs_attr_remove_args(
> > >   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > >   		error = xfs_attr_leaf_removename(args);
> > >   	} else {
> > > +node:
> > >   		error = xfs_attr_node_removename(args);
> > >   	}
> > > @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
> > >   		/* bp is gone due to xfs_da_shrink_inode */
> > >   		if (error)
> > >   			return error;
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > -			return error;
> > > +
> > > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> > >   	}
> > >   	return 0;
> > >   }
> > > @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
> > >    * This will involve walking down the Btree, and may involve joining
> > >    * leaf nodes and even joining intermediate nodes up to and including
> > >    * the root node (a special case of an intermediate node).
> > > + *
> > > + * This routine is meant to function as either an inline or delayed operation,
> > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > + * functions will need to handle this, and recall the function until a
> > > + * successful error code is returned.
> > >    */
> > >   STATIC int
> > >   xfs_attr_node_removename(
> > > @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
> > >   	struct xfs_inode	*dp = args->dp;
> > >   	trace_xfs_attr_node_removename(args);
> > > +	state = args->dac.da_state;
> > > +	blk = args->dac.blk;
> > > +
> > > +	/* State machine switch */
> > > +	switch (args->dac.dela_state) {
> > > +	case XFS_DAS_RMTVAL_REMOVE:
> > > +		goto rm_node_blks;
> > > +	case XFS_DAS_RM_SHRINK:
> > > +		goto rm_shrink;
> > > +	default:
> > > +		break;
> > > +	}
> > >   	error = xfs_attr_node_hasname(args, &state);
> > >   	if (error != -EEXIST)
> > >   		goto out;
> > > +	else
> > > +		error = 0;
> > 
> > This doesn't look necessary.
> Well, at this point error has to be -EEXIST.  Which is great because we need
> the attr to exist, but we dont want to return that as error for this
> function.  Which can happen if error is not otherwise set.
> 

AFAICT every codepath after this assigns error one way or another before
it's returned. There's another error = 0 assignment just before the out:
label.

> > 
> > >   	/*
> > >   	 * If there is an out-of-line value, de-allocate the blocks.
> > > @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
> > >   	blk = &state->path.blk[ state->path.active-1 ];
> > >   	ASSERT(blk->bp != NULL);
> > >   	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> > > +
> > > +	/*
> > > +	 * Store blk and state in the context incase we need to cycle out the
> > > +	 * transaction
> > > +	 */
> > > +	args->dac.blk = blk;
> > > +	args->dac.da_state = state;
> > > +
> > >   	if (args->rmtblkno > 0) {
> > >   		/*
> > >   		 * Fill in disk block numbers in the state structure
> > > @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
> > >   		if (error)
> > >   			goto out;
> > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > +		error = xfs_attr_rmtval_invalidate(args);
> > 
> > Remind me why we lose the above trans roll? I vaguely recall that this
> > was intentional, but I could be mistaken...
> I think we removed it in v5.  We used to have a  XFS_DAS_RM_INVALIDATE
> state, but then we reasoned that because these are just in-core changes, we
> didnt need it, so we eliminated this state entirely.
> 
> Maybe i just add a comment here?  Just as a reminder
> 

Ah, Ok. Normally I'd say document things like this in the commit log so
we don't lose track, though I don't know how much space we have there.
;)

> > 
> > >   		if (error)
> > >   			goto out;
> > > +	}
> > > -		error = xfs_attr_rmtval_remove(args);
> > > -		if (error)
> > > -			goto out;
> > > +rm_node_blks:
> > > +
> > > +	if (args->rmtblkno > 0) {
> > > +		error = xfs_attr_rmtval_unmap(args);
> > > +
> > > +		if (error) {
> > > +			if (error == -EAGAIN)
> > > +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
> > 
> > Might be helpful for the code labels to match the state names. I.e., use
> > das_rmtval_remove: for the label above.
> Sure, I can update add the das prefix.
> 
> > 
> > > +			return error;
> > > +		}
> > >   		/*
> > >   		 * Refill the state structure with buffers, the prior calls
> > > @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
> > >   		error = xfs_da3_join(state);
> > >   		if (error)
> > >   			goto out;
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > -			goto out;
> > > -		/*
> > > -		 * Commit the Btree join operation and start a new trans.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > -		if (error)
> > > -			goto out;
> > > +
> > > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> > > +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > > +		return -EAGAIN;
> > >   	}
> > > +rm_shrink:
> > > +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > > +
> > 
> > There's an xfs_defer_finish() call further down this function. Should
> > that be replaced with the flag?
> > 
> > Finally, I mentioned in a previous review that this function should
> > probably be further broken down before fitting in the state management
> > stuff. It doesn't look like that happened so I've attached a diff that
> > is just intended to give an idea of what I mean by sectioning off the
> > hunks that might be able to break down into helpers. The helpers
> > wouldn't contain any state management, so we create a clear separation
> > between the state code and functional components.
> Yes, it's xfs_attr_node_shrink in patch 15.  I moved it to another patch to
> try and keep the activity in this one to a minimum.  Apologies if it
> surprised you!  And then i mistakenly had taken the XFS_DAC_FINISH_TRANS
> flag with it.  I meant to keep all the state machine stuff here.  Will fix!
> 

Ok, I might have just not got there yet.

> I think this initial
> > refactoring would make the introduction of state much more simple
> 
> I guess I didn't think people would be partial to introducing helpers before
> or after the state logic.  I put them after in this set because the states
> are visible now, so I though it would make the goal of modularizing code
> between the states more clear to folks.  Do you think I should move it back
> behind the state machine patches?
> 

I do think the refactoring should be done first. This does make it more
challenging for the developer (IMO) because I know I'd probably have to
hack around with the state bits to have a better idea of how to refactor
things in some cases, and then go back and retrofit the refactoring.

The advantage is that the heavy lifting in this series becomes agnostic
to the state bits. Refactoring patches are easier to review and we can
make progress because there's less of a need to carry those out of tree
through however many versions of the state code we'll need before
getting it merged. Once the code is sufficiently factored, the state
code should be much simpler to introduce and review since we hopefully
won't be jumping around into the middle of functions, multiple branches
of logic deep, etc.

(I see Dave commented similarly on a couple of the subsequent patches. I
100% agree with the approach he describes there and that is similar to
what I was trying to describe with the diff I attached in my earlier
mail...)

Brian

> (and
> > perhaps alleviate the need for the huge diagram).
> Well, I get the impression that people find the series sort of scary and
> maybe the diagrams help them a bit.  Maybe we can take them out later after
> people feel like they are comfortable with things?
> 
> It might also be
> > interesting to see how much of the result could be folded up further
> > into _removename_iter()...
> 
> Yes, I think that is the goal we're reaching for.  I will add the other
> helpers I see in your diff too.
> 
> Thanks for the reviews!
> Allison
> 
> > 
> > Brian
> > 
> > >   	/*
> > >   	 * If the result is small enough, push it all into the inode.
> > >   	 */
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index ce7b039..ea873a5 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
> > >   int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
> > >   int xfs_has_attr(struct xfs_da_args *args);
> > >   int xfs_attr_remove_args(struct xfs_da_args *args);
> > > +int xfs_attr_remove_iter(struct xfs_da_args *args);
> > >   int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
> > >   		  int flags, struct attrlist_cursor_kern *cursor);
> > >   bool xfs_attr_namecheck(const void *name, size_t length);
> > > diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> > > index 14f1be3..3c78498 100644
> > > --- a/fs/xfs/libxfs/xfs_da_btree.h
> > > +++ b/fs/xfs/libxfs/xfs_da_btree.h
> > > @@ -50,9 +50,39 @@ enum xfs_dacmp {
> > >   };
> > >   /*
> > > + * Enum values for xfs_delattr_context.da_state
> > > + *
> > > + * These values are used by delayed attribute operations to keep track  of where
> > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > + * calling function to roll the transaction, and then recall the subroutine to
> > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > + * to where it was and resume executing where it left off.
> > > + */
> > > +enum xfs_delattr_state {
> > > +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
> > > +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
> > > +};
> > > +
> > > +/*
> > > + * Defines for xfs_delattr_context.flags
> > > + */
> > > +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
> > > +
> > > +/*
> > > + * Context used for keeping track of delayed attribute operations
> > > + */
> > > +struct xfs_delattr_context {
> > > +	struct xfs_da_state	*da_state;
> > > +	struct xfs_da_state_blk *blk;
> > > +	unsigned int		flags;
> > > +	enum xfs_delattr_state	dela_state;
> > > +};
> > > +
> > > +/*
> > >    * Structure to ease passing around component names.
> > >    */
> > >   typedef struct xfs_da_args {
> > > +	struct xfs_delattr_context dac; /* context used for delay attr ops */
> > >   	struct xfs_da_geometry *geo;	/* da block geometry */
> > >   	struct xfs_name	name;		/* name, length and argument  flags*/
> > >   	uint8_t		filetype;	/* filetype of inode for directories */
> > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > index 1887605..9a649d1 100644
> > > --- a/fs/xfs/scrub/common.c
> > > +++ b/fs/xfs/scrub/common.c
> > > @@ -24,6 +24,8 @@
> > >   #include "xfs_rmap_btree.h"
> > >   #include "xfs_log.h"
> > >   #include "xfs_trans_priv.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_reflink.h"
> > >   #include "scrub/scrub.h"
> > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > > index 42ac847..d65e6d8 100644
> > > --- a/fs/xfs/xfs_acl.c
> > > +++ b/fs/xfs/xfs_acl.c
> > > @@ -10,6 +10,8 @@
> > >   #include "xfs_trans_resv.h"
> > >   #include "xfs_mount.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_trace.h"
> > >   #include "xfs_error.h"
> > > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > > index d37743b..881b9a4 100644
> > > --- a/fs/xfs/xfs_attr_list.c
> > > +++ b/fs/xfs/xfs_attr_list.c
> > > @@ -12,6 +12,7 @@
> > >   #include "xfs_trans_resv.h"
> > >   #include "xfs_mount.h"
> > >   #include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_inode.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_bmap.h"
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index 28c07c9..7c1d9da 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -15,6 +15,8 @@
> > >   #include "xfs_iwalk.h"
> > >   #include "xfs_itable.h"
> > >   #include "xfs_error.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_bmap.h"
> > >   #include "xfs_bmap_util.h"
> > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > index 769581a..d504f8f 100644
> > > --- a/fs/xfs/xfs_ioctl32.c
> > > +++ b/fs/xfs/xfs_ioctl32.c
> > > @@ -17,6 +17,8 @@
> > >   #include "xfs_itable.h"
> > >   #include "xfs_fsops.h"
> > >   #include "xfs_rtalloc.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_ioctl.h"
> > >   #include "xfs_ioctl32.h"
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index e85bbf5..a2d299f 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -13,6 +13,8 @@
> > >   #include "xfs_inode.h"
> > >   #include "xfs_acl.h"
> > >   #include "xfs_quota.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_trace.h"
> > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > > index 74133a5..d8dc72d 100644
> > > --- a/fs/xfs/xfs_xattr.c
> > > +++ b/fs/xfs/xfs_xattr.c
> > > @@ -10,6 +10,7 @@
> > >   #include "xfs_log_format.h"
> > >   #include "xfs_da_format.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_acl.h"
> > > -- 
> > > 2.7.4
> > > 
> > 
>
Allison Henderson Feb. 26, 2020, 12:57 a.m. UTC | #7
On 2/25/20 1:57 AM, Dave Chinner wrote:
> On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
>> This patch modifies the attr remove routines to be delay ready. This means they no
>> longer roll or commit transactions, but instead return -EAGAIN to have the calling
>> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
>> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
>> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
>> been modified to use the switch, and a  new version of xfs_attr_remove_args
>> consists of a simple loop to refresh the transaction until the operation is
>> completed.
>>
>> This patch also adds a new struct xfs_delattr_context, which we will use to keep
>> track of the current state of an attribute operation. The new xfs_delattr_state
>> enum is used to track various operations that are in progress so that we know not
>> to repeat them, and resume where we left off before EAGAIN was returned to cycle
>> out the transaction. Other members take the place of local variables that need
>> to retain their values across multiple function recalls.
>>
>> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
> 
> Ok, so I find all the DA/da prefixes in this code confusing,
> especially as they have very similar actual names. e.g. da_state
> vs delattr_state, DAS vs DA_STATE, etc.
> 
> Basically, I can't tell from reading the code what "DA" the actual
> variable belongs to, and in a few months time I'll most definitely
> have forgotten and have to relearn it from scratch.
> 
> So while "Delayed Attributes" is a great name for the feature, I
> don't think it makes a great acronym for shortening variable names
> because of the conflict with the existing DA namespace prefix.
> 
> Also, "dac" as shorthand for delattr context is also overloaded.
> "DAC" is "discretionary access control" and is quite widely used
> in the kernel (e.g. CAP_DAC_READ_SEARCH, CAP_DAC_OVERRIDE) so again
> I read thsi code and it doesn't make much sense.
> 
> I haven't come up with a better name - "attribute iterator" is the
> best I've managed (marketing++ - XFS has AI now!) and shortening it
> down to ator would go a long way to alleviating my namespace
> confusion....

Sure, no worries, there's still time to give it some thought
> 
>> indicate places where the function would return -EAGAIN, and then immediately
>> resume from after being recalled by the calling function.  States marked as a
>> "subroutine state" indicate that they belong to a subroutine, and so the calling
>> function needs to pass them back to that subroutine to allow it to finish where
>> it left off. But they otherwise do not have a role in the calling function other
>> than just passing through.
>>
>>   xfs_attr_remove_iter()
>>           XFS_DAS_RM_SHRINK     ─┐
>>           (subroutine state)     │
>>                                  │
>>           XFS_DAS_RMTVAL_REMOVE ─┤
>>           (subroutine state)     │
>>                                  └─>xfs_attr_node_removename()
>>                                                   │
>>                                                   v
>>                                           need to remove
>>                                     ┌─n──  rmt blocks?
>>                                     │             │
>>                                     │             y
>>                                     │             │
>>                                     │             v
>>                                     │  ┌─>XFS_DAS_RMTVAL_REMOVE
>>                                     │  │          │
>>                                     │  │          v
>>                                     │  └──y── more blks
>>                                     │         to remove?
>>                                     │             │
>>                                     │             n
>>                                     │             │
>>                                     │             v
>>                                     │         need to
>>                                     └─────> shrink tree? ─n─┐
>>                                                   │         │
>>                                                   y         │
>>                                                   │         │
>>                                                   v         │
>>                                           XFS_DAS_RM_SHRINK │
>>                                                   │         │
>>                                                   v         │
>>                                                  done <─────┘
> 
> Nice.
I'm glad people like those, I wasnt sure what people expected or what to 
expect as a response, but I think it helps facilitate the design at 
least for the time being :-)

> 
>>
>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>>   fs/xfs/libxfs/xfs_attr.h     |   1 +
>>   fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>>   fs/xfs/scrub/common.c        |   2 +
>>   fs/xfs/xfs_acl.c             |   2 +
>>   fs/xfs/xfs_attr_list.c       |   1 +
>>   fs/xfs/xfs_ioctl.c           |   2 +
>>   fs/xfs/xfs_ioctl32.c         |   2 +
>>   fs/xfs/xfs_iops.c            |   2 +
>>   fs/xfs/xfs_xattr.c           |   1 +
>>   10 files changed, 141 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 5d73bdf..cd3a3f7 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -368,11 +368,60 @@ xfs_has_attr(
>>    */
>>   int
>>   xfs_attr_remove_args(
>> +	struct xfs_da_args	*argsc
>> +{
>> +	int			error = 0;
>> +	int			err2 = 0;
>> +
>> +	do {
>> +		error = xfs_attr_remove_iter(args);
>> +		if (error && error != -EAGAIN)
>> +			goto out;
>> +
>> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>> +
>> +			err2 = xfs_defer_finish(&args->trans);
>> +			if (err2) {
>> +				error = err2;
>> +				goto out;
>> +			}
>> +		}
>> +
>> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		if (err2) {
>> +			error = err2;
>> +			goto out;
>> +		}
>> +
>> +	} while (error == -EAGAIN);
>> +out:
>> +	return error;
>> +}
> 
> Brian commented on the structure of this loop better than I could.
> 
>> +
>> +/*
>> + * Remove the attribute specified in @args.
>> + *
>> + * This function may return -EAGAIN to signal that the transaction needs to be
>> + * rolled.  Callers should continue calling this function until they receive a
>> + * return value other than -EAGAIN.
>> + */
>> +int
>> +xfs_attr_remove_iter(
>>   	struct xfs_da_args      *args)
>>   {
>>   	struct xfs_inode	*dp = args->dp;
>>   	int			error;
>>   
>> +	/* State machine switch */
>> +	switch (args->dac.dela_state) {
>> +	case XFS_DAS_RM_SHRINK:
>> +	case XFS_DAS_RMTVAL_REMOVE:
>> +		goto node;
>> +	default:
>> +		break;
>> +	}
> 
> Why separate out the state machine? Doesn't this shortcut the
> xfs_inode_hasattr() check? Shouldn't that come first?
Well, the idea is that when we first start the routine, we come in with 
neither state set, and we fall through to the break.  So we execute the 
check the first time through.

Though now that you point it out, I should probably go back and put the 
explicit numbering back in the enum (starting with 1) or they will 
default to zero, which would be incorrect.  I had pulled it out in one 
of the last reviews thinking it would be ok, but it should go back in.

> 
> As it is:
> 
> 	case XFS_DAS_RM_SHRINK:
> 	case XFS_DAS_RMTVAL_REMOVE:
> 		return xfs_attr_node_removename(args);
> 	default:
> 		break;
> 
> would be nicer, and if this is the only way we can get to
> xfs_attr_node_removename(c, getting rid of it from the code
> below could be done, too.
Well, the remove path is a lot simpler than the set path, so that trick 
does work here :-)

The idea though was to establish "jump points" with the "XFS_DAS_*" 
states.  Based on the state, we jump back to where we were.  We could 
break this pattern for the remove path, but I dont think we'd want to do 
the same for the others.  The set routine is a really big function that 
would end up being inside a really big switch!

> 
> 
>> +
>>   	if (!xfs_inode_hasattr(dp)) {
>>   		error = -ENOATTR;
>>   	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
>> @@ -381,6 +430,7 @@ xfs_attr_remove_args(
>>   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>   		error = xfs_attr_leaf_removename(args);
>>   	} else {
>> +node:
>>   		error = xfs_attr_node_removename(args);
>>   	}
>>   
>> @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
>>   		/* bp is gone due to xfs_da_shrink_inode */
>>   		if (error)
>>   			return error;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			return error;
>> +
>> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>>   	}
>>   	return 0;
>>   }
>> @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
>>    * This will involve walking down the Btree, and may involve joining
>>    * leaf nodes and even joining intermediate nodes up to and including
>>    * the root node (a special case of an intermediate node).
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>>   xfs_attr_node_removename(
>> @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
>>   	struct xfs_inode	*dp = args->dp;
>>   
>>   	trace_xfs_attr_node_removename(args);
>> +	state = args->dac.da_state;
>> +	blk = args->dac.blk;
>> +
>> +	/* State machine switch */
>> +	switch (args->dac.dela_state) {
>> +	case XFS_DAS_RMTVAL_REMOVE:
>> +		goto rm_node_blks;
>> +	case XFS_DAS_RM_SHRINK:
>> +		goto rm_shrink;
>> +	default:
>> +		break;
>> +	}
> 
> This really is calling out for this function to be broken into three
> smaller functions. That would greatly simplify the code flow and
> logic here.
Yes, that is the goal we are working towards.

> 
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index ce7b039..ea873a5 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>> +int xfs_attr_remove_iter(struct xfs_da_args *args);
>>   int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
>>   		  int flags, struct attrlist_cursor_kern *cursor);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
>> index 14f1be3..3c78498 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.h
>> +++ b/fs/xfs/libxfs/xfs_da_btree.h
>> @@ -50,9 +50,39 @@ enum xfs_dacmp {
>>   };
>>   
>>   /*
>> + * Enum values for xfs_delattr_context.da_state
>> + *
>> + * These values are used by delayed attribute operations to keep track  of where
>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>> + * calling function to roll the transaction, and then recall the subroutine to
>> + * finish the operation.  The enum is then used by the subroutine to jump back
>> + * to where it was and resume executing where it left off.
>> + */
>> +enum xfs_delattr_state {
>> +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */Note to self: need put the ordering back to starting at 1, not zero

>> +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
>> +};
>> +
>> +/*
>> + * Defines for xfs_delattr_context.flags
>> + */
>> +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
>> +
>> +/*
>> + * Context used for keeping track of delayed attribute operations
>> + */
>> +struct xfs_delattr_context {
>> +	struct xfs_da_state	*da_state;
>> +	struct xfs_da_state_blk *blk;
>> +	unsigned int		flags;
>> +	enum xfs_delattr_state	dela_state;
>> +};
>> +
>> +/*
>>    * Structure to ease passing around component names.
>>    */
>>   typedef struct xfs_da_args {
>> +	struct xfs_delattr_context dac; /* context used for delay attr ops */
> 
> Probably should put this at the end of the structure rather than the
> front.
Sure, will do

> 
> I'm also wondering if it should be kept separate to the da_args and
> contain a pointer to the da_args instead of being wrapped inside
> them.
> 
> i.e. we put the iterating state structure on the stack, then
> 
> 	struct attr_iter	ater = {
> 		.da_args = args,
> 	};
> 
> 	do {
> 		error = xfs_attr_remove_iter(&ater);
> 		.....
> 	
> And that largely separates the delayed attribute iteration state
> from the da_args that holds the internal attribute manipulation
> information.
Oh i see.  Sure, let me see if that will work, it seems like it should

> 
>>   	struct xfs_da_geometry *geo;	/* da block geometry */
>>   	struct xfs_name	name;		/* name, length and argument  flags*/
>>   	uint8_t		filetype;	/* filetype of inode for directories */
>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>> index 1887605..9a649d1 100644
>> --- a/fs/xfs/scrub/common.c
>> +++ b/fs/xfs/scrub/common.c
>> @@ -24,6 +24,8 @@
>>   #include "xfs_rmap_btree.h"
>>   #include "xfs_log.h"
>>   #include "xfs_trans_priv.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_reflink.h"
>>   #include "scrub/scrub.h"
> 
> Hmmm - why are these new includes necessary? You didn't add anything
> new to these files or common header files to make the includes
> needed....

Because the delayed attr context uses things from those headers.  And we 
put the context in xfs_da_args.  Now everything that uses xfs_da_args 
needs those includes.  But maybe if we do what you suggest above, we 
wont need to. :-)

Thanks for the reviews!  I know its a lot!
Allison

> 
> Cheers,
> 
> Dave.
>
Allison Henderson Feb. 26, 2020, 5:36 a.m. UTC | #8
On 2/25/20 6:34 AM, Brian Foster wrote:
> On Mon, Feb 24, 2020 at 04:14:48PM -0700, Allison Collins wrote:
>> On 2/24/20 8:25 AM, Brian Foster wrote:
>>> On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
>>>> This patch modifies the attr remove routines to be delay ready. This means they no
>>>> longer roll or commit transactions, but instead return -EAGAIN to have the calling
>>>> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
>>>> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
>>>> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
>>>> been modified to use the switch, and a  new version of xfs_attr_remove_args
>>>> consists of a simple loop to refresh the transaction until the operation is
>>>> completed.
>>>>
>>>> This patch also adds a new struct xfs_delattr_context, which we will use to keep
>>>> track of the current state of an attribute operation. The new xfs_delattr_state
>>>> enum is used to track various operations that are in progress so that we know not
>>>> to repeat them, and resume where we left off before EAGAIN was returned to cycle
>>>> out the transaction. Other members take the place of local variables that need
>>>> to retain their values across multiple function recalls.
>>>>
>>>> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
>>>> indicate places where the function would return -EAGAIN, and then immediately
>>>> resume from after being recalled by the calling function.  States marked as a
>>>> "subroutine state" indicate that they belong to a subroutine, and so the calling
>>>> function needs to pass them back to that subroutine to allow it to finish where
>>>> it left off. But they otherwise do not have a role in the calling function other
>>>> than just passing through.
>>>>
>>>>    xfs_attr_remove_iter()
>>>>            XFS_DAS_RM_SHRINK     ─┐
>>>>            (subroutine state)     │
>>>>                                   │
>>>>            XFS_DAS_RMTVAL_REMOVE ─┤
>>>>            (subroutine state)     │
>>>>                                   └─>xfs_attr_node_removename()
>>>>                                                    │
>>>>                                                    v
>>>>                                            need to remove
>>>>                                      ┌─n──  rmt blocks?
>>>>                                      │             │
>>>>                                      │             y
>>>>                                      │             │
>>>>                                      │             v
>>>>                                      │  ┌─>XFS_DAS_RMTVAL_REMOVE
>>>>                                      │  │          │
>>>>                                      │  │          v
>>>>                                      │  └──y── more blks
>>>>                                      │         to remove?
>>>>                                      │             │
>>>>                                      │             n
>>>>                                      │             │
>>>>                                      │             v
>>>>                                      │         need to
>>>>                                      └─────> shrink tree? ─n─┐
>>>>                                                    │         │
>>>>                                                    y         │
>>>>                                                    │         │
>>>>                                                    v         │
>>>>                                            XFS_DAS_RM_SHRINK │
>>>>                                                    │         │
>>>>                                                    v         │
>>>>                                                   done <─────┘
>>>>
>>>
>>> Wow. :P I guess I have nothing against verbose commit logs, but I wonder
>>> how useful this level of documentation is for a patch that shouldn't
>>> really change the existing flow of the operation.
>>
>> Yes Darrick had requested a diagram in the last review, so I had put this
>> together.  I wasnt sure where the best place to put it even was, so I put it
>> here at least for now.  I have no idea if there is a limit on commit message
>> length, but if there is, I'm pretty sure I blew right past it in this patch
>> and the next.  Maybe if anything it can just be here for now while we work
>> through things?
>>
> 
> No problem.. if it's useful it's good to have a record of out around
> somewhere until the end result is more stabilized and we can determine
> whether this warrants a permanent home somewhere in the code.
> 
>>>
>>>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>>>> ---
>>>>    fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>>>>    fs/xfs/libxfs/xfs_attr.h     |   1 +
>>>>    fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>>>>    fs/xfs/scrub/common.c        |   2 +
>>>>    fs/xfs/xfs_acl.c             |   2 +
>>>>    fs/xfs/xfs_attr_list.c       |   1 +
>>>>    fs/xfs/xfs_ioctl.c           |   2 +
>>>>    fs/xfs/xfs_ioctl32.c         |   2 +
>>>>    fs/xfs/xfs_iops.c            |   2 +
>>>>    fs/xfs/xfs_xattr.c           |   1 +
>>>>    10 files changed, 141 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>> index 5d73bdf..cd3a3f7 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>> @@ -368,11 +368,60 @@ xfs_has_attr(
>>>>     */
>>>>    int
>>>>    xfs_attr_remove_args(
>>>> +	struct xfs_da_args	*args)
>>>> +{
>>>> +	int			error = 0;
>>>> +	int			err2 = 0;
>>>> +
>>>> +	do {
>>>> +		error = xfs_attr_remove_iter(args);
>>>> +		if (error && error != -EAGAIN)
>>>> +			goto out;
>>>> +
>>>
>>> I'm a little confused on the logic of this loop given that the only
>>> caller commits the transaction (which also finishes dfops). IOW, it
>>> seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
>>> that is the case, this can be simplified to something like:
>> Well, we need to do it when error == -EAGAIN or 0, right? Which I think
>> better imitates the defer_finish routines.  That's why a lot of the existing
>> code that just finishes off with a transaction just sort of gets sawed off
>> at the end. Otherwise they would need one more state just to return -EAGAIN
>> as the last thing they have to do. Did that make sense?
>>
> 
> Hmm.. I could just be missing something or not far along enough in the
> series. Can you point me at an example of where we need to finish/roll
> before the caller of xfs_attr_remove_args() commits the transaction?
> 
Ok, in looking for an example, realized all such examples appear in the 
next patch ;-)  So maybe we can get away with simplifying it in this patch.

For the next patch though, it's any place the roll/finish disappears, 
and an "return -EAGAIN" does not.  For example, at the end of 
xfs_attr_leaf_addname.

>>>
>>> int
>>> xfs_attr_remove_args(
>>>           struct xfs_da_args      *args)
>>> {
>>>           int                     error;
>>>
>>>           do {
>>>                   error = xfs_attr_remove_iter(args);
>>>                   if (error != -EAGAIN)
>>>                           break;
>>>
>>>                   if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>>>                           args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>>>                           error = xfs_defer_finish(&args->trans);
>>>                           if (error)
>>>                                   break;
>>>                   }
>>>
>>>                   error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>                   if (error)
>>>                           break;
>>>           } while (true);
>>>
>>>           return error;
>>> }
>>>
>>> That has the added benefit of eliminating the whole err2 pattern, which
>>> always strikes me as a landmine.
>>>
>>>> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>>>
>>> BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
>>> operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?
>> Sure, will update
>>
>>>
>>>> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>>>> +
>>>> +			err2 = xfs_defer_finish(&args->trans);
>>>> +			if (err2) {
>>>> +				error = err2;
>>>> +				goto out;
>>>> +			}
>>>> +		}
>>>> +
>>>> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
>>>> +		if (err2) {
>>>> +			error = err2;
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +	} while (error == -EAGAIN);
>>>> +out:
>>>> +	return error;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Remove the attribute specified in @args.
>>>> + *
>>>> + * This function may return -EAGAIN to signal that the transaction needs to be
>>>> + * rolled.  Callers should continue calling this function until they receive a
>>>> + * return value other than -EAGAIN.
>>>> + */
>>>> +int
>>>> +xfs_attr_remove_iter(
>>>>    	struct xfs_da_args      *args)
>>>>    {
>>>>    	struct xfs_inode	*dp = args->dp;
>>>>    	int			error;
>>>> +	/* State machine switch */
>>>> +	switch (args->dac.dela_state) {
>>>> +	case XFS_DAS_RM_SHRINK:
>>>> +	case XFS_DAS_RMTVAL_REMOVE:
>>>> +		goto node;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>> +
>>>>    	if (!xfs_inode_hasattr(dp)) {
>>>>    		error = -ENOATTR;
>>>>    	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
>>>> @@ -381,6 +430,7 @@ xfs_attr_remove_args(
>>>>    	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>>>    		error = xfs_attr_leaf_removename(args);
>>>>    	} else {
>>>> +node:
>>>>    		error = xfs_attr_node_removename(args);
>>>>    	}
>>>> @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
>>>>    		/* bp is gone due to xfs_da_shrink_inode */
>>>>    		if (error)
>>>>    			return error;
>>>> -		error = xfs_defer_finish(&args->trans);
>>>> -		if (error)
>>>> -			return error;
>>>> +
>>>> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>>>>    	}
>>>>    	return 0;
>>>>    }
>>>> @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
>>>>     * This will involve walking down the Btree, and may involve joining
>>>>     * leaf nodes and even joining intermediate nodes up to and including
>>>>     * the root node (a special case of an intermediate node).
>>>> + *
>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>> + * functions will need to handle this, and recall the function until a
>>>> + * successful error code is returned.
>>>>     */
>>>>    STATIC int
>>>>    xfs_attr_node_removename(
>>>> @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
>>>>    	struct xfs_inode	*dp = args->dp;
>>>>    	trace_xfs_attr_node_removename(args);
>>>> +	state = args->dac.da_state;
>>>> +	blk = args->dac.blk;
>>>> +
>>>> +	/* State machine switch */
>>>> +	switch (args->dac.dela_state) {
>>>> +	case XFS_DAS_RMTVAL_REMOVE:
>>>> +		goto rm_node_blks;
>>>> +	case XFS_DAS_RM_SHRINK:
>>>> +		goto rm_shrink;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>>    	error = xfs_attr_node_hasname(args, &state);
>>>>    	if (error != -EEXIST)
>>>>    		goto out;
>>>> +	else
>>>> +		error = 0;
>>>
>>> This doesn't look necessary.
>> Well, at this point error has to be -EEXIST.  Which is great because we need
>> the attr to exist, but we dont want to return that as error for this
>> function.  Which can happen if error is not otherwise set.
>>
> 
> AFAICT every codepath after this assigns error one way or another before
> it's returned. There's another error = 0 assignment just before the out:
> label.
Ok, I see it.  Will remove.

> 
>>>
>>>>    	/*
>>>>    	 * If there is an out-of-line value, de-allocate the blocks.
>>>> @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
>>>>    	blk = &state->path.blk[ state->path.active-1 ];
>>>>    	ASSERT(blk->bp != NULL);
>>>>    	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
>>>> +
>>>> +	/*
>>>> +	 * Store blk and state in the context incase we need to cycle out the
>>>> +	 * transaction
>>>> +	 */
>>>> +	args->dac.blk = blk;
>>>> +	args->dac.da_state = state;
>>>> +
>>>>    	if (args->rmtblkno > 0) {
>>>>    		/*
>>>>    		 * Fill in disk block numbers in the state structure
>>>> @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
>>>>    		if (error)
>>>>    			goto out;
>>>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>> +		error = xfs_attr_rmtval_invalidate(args);
>>>
>>> Remind me why we lose the above trans roll? I vaguely recall that this
>>> was intentional, but I could be mistaken...
>> I think we removed it in v5.  We used to have a  XFS_DAS_RM_INVALIDATE
>> state, but then we reasoned that because these are just in-core changes, we
>> didnt need it, so we eliminated this state entirely.
>>
>> Maybe i just add a comment here?  Just as a reminder
>>
> 
> Ah, Ok. Normally I'd say document things like this in the commit log so
> we don't lose track, though I don't know how much space we have there.
> ;)
Ok, I'll see if I can squeeze in a few more lines :-)

> 
>>>
>>>>    		if (error)
>>>>    			goto out;
>>>> +	}
>>>> -		error = xfs_attr_rmtval_remove(args);
>>>> -		if (error)
>>>> -			goto out;
>>>> +rm_node_blks:
>>>> +
>>>> +	if (args->rmtblkno > 0) {
>>>> +		error = xfs_attr_rmtval_unmap(args);
>>>> +
>>>> +		if (error) {
>>>> +			if (error == -EAGAIN)
>>>> +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
>>>
>>> Might be helpful for the code labels to match the state names. I.e., use
>>> das_rmtval_remove: for the label above.
>> Sure, I can update add the das prefix.
>>
>>>
>>>> +			return error;
>>>> +		}
>>>>    		/*
>>>>    		 * Refill the state structure with buffers, the prior calls
>>>> @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
>>>>    		error = xfs_da3_join(state);
>>>>    		if (error)
>>>>    			goto out;
>>>> -		error = xfs_defer_finish(&args->trans);
>>>> -		if (error)
>>>> -			goto out;
>>>> -		/*
>>>> -		 * Commit the Btree join operation and start a new trans.
>>>> -		 */
>>>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>>>> -		if (error)
>>>> -			goto out;
>>>> +
>>>> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>>>> +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
>>>> +		return -EAGAIN;
>>>>    	}
>>>> +rm_shrink:
>>>> +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
>>>> +
>>>
>>> There's an xfs_defer_finish() call further down this function. Should
>>> that be replaced with the flag?
>>>
>>> Finally, I mentioned in a previous review that this function should
>>> probably be further broken down before fitting in the state management
>>> stuff. It doesn't look like that happened so I've attached a diff that
>>> is just intended to give an idea of what I mean by sectioning off the
>>> hunks that might be able to break down into helpers. The helpers
>>> wouldn't contain any state management, so we create a clear separation
>>> between the state code and functional components.
>> Yes, it's xfs_attr_node_shrink in patch 15.  I moved it to another patch to
>> try and keep the activity in this one to a minimum.  Apologies if it
>> surprised you!  And then i mistakenly had taken the XFS_DAC_FINISH_TRANS
>> flag with it.  I meant to keep all the state machine stuff here.  Will fix!
>>
> 
> Ok, I might have just not got there yet.
> 
>> I think this initial
>>> refactoring would make the introduction of state much more simple
>>
>> I guess I didn't think people would be partial to introducing helpers before
>> or after the state logic.  I put them after in this set because the states
>> are visible now, so I though it would make the goal of modularizing code
>> between the states more clear to folks.  Do you think I should move it back
>> behind the state machine patches?
>>
> 
> I do think the refactoring should be done first. This does make it more
> challenging for the developer (IMO) because I know I'd probably have to
> hack around with the state bits to have a better idea of how to refactor
> things in some cases, and then go back and retrofit the refactoring.
> 
> The advantage is that the heavy lifting in this series becomes agnostic
> to the state bits. Refactoring patches are easier to review and we can
> make progress because there's less of a need to carry those out of tree
> through however many versions of the state code we'll need before
> getting it merged. Once the code is sufficiently factored, the state
> code should be much simpler to introduce and review since we hopefully
> won't be jumping around into the middle of functions, multiple branches
> of logic deep, etc.
> 
> (I see Dave commented similarly on a couple of the subsequent patches. I
> 100% agree with the approach he describes there and that is similar to
> what I was trying to describe with the diff I attached in my earlier
> mail...)
> 
> Brian

Alrighty then, will move back.  Thanks, and thanks again for the reviews!!

Allison

> 
>> (and
>>> perhaps alleviate the need for the huge diagram).
>> Well, I get the impression that people find the series sort of scary and
>> maybe the diagrams help them a bit.  Maybe we can take them out later after
>> people feel like they are comfortable with things?
>>
>> It might also be
>>> interesting to see how much of the result could be folded up further
>>> into _removename_iter()...
>>
>> Yes, I think that is the goal we're reaching for.  I will add the other
>> helpers I see in your diff too.
>>
>> Thanks for the reviews!
>> Allison
>>
>>>
>>> Brian
>>>
>>>>    	/*
>>>>    	 * If the result is small enough, push it all into the inode.
>>>>    	 */
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>> index ce7b039..ea873a5 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>> @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>>>>    int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
>>>>    int xfs_has_attr(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_args(struct xfs_da_args *args);
>>>> +int xfs_attr_remove_iter(struct xfs_da_args *args);
>>>>    int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
>>>>    		  int flags, struct attrlist_cursor_kern *cursor);
>>>>    bool xfs_attr_namecheck(const void *name, size_t length);
>>>> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
>>>> index 14f1be3..3c78498 100644
>>>> --- a/fs/xfs/libxfs/xfs_da_btree.h
>>>> +++ b/fs/xfs/libxfs/xfs_da_btree.h
>>>> @@ -50,9 +50,39 @@ enum xfs_dacmp {
>>>>    };
>>>>    /*
>>>> + * Enum values for xfs_delattr_context.da_state
>>>> + *
>>>> + * These values are used by delayed attribute operations to keep track  of where
>>>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>>>> + * calling function to roll the transaction, and then recall the subroutine to
>>>> + * finish the operation.  The enum is then used by the subroutine to jump back
>>>> + * to where it was and resume executing where it left off.
>>>> + */
>>>> +enum xfs_delattr_state {
>>>> +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
>>>> +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
>>>> +};
>>>> +
>>>> +/*
>>>> + * Defines for xfs_delattr_context.flags
>>>> + */
>>>> +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
>>>> +
>>>> +/*
>>>> + * Context used for keeping track of delayed attribute operations
>>>> + */
>>>> +struct xfs_delattr_context {
>>>> +	struct xfs_da_state	*da_state;
>>>> +	struct xfs_da_state_blk *blk;
>>>> +	unsigned int		flags;
>>>> +	enum xfs_delattr_state	dela_state;
>>>> +};
>>>> +
>>>> +/*
>>>>     * Structure to ease passing around component names.
>>>>     */
>>>>    typedef struct xfs_da_args {
>>>> +	struct xfs_delattr_context dac; /* context used for delay attr ops */
>>>>    	struct xfs_da_geometry *geo;	/* da block geometry */
>>>>    	struct xfs_name	name;		/* name, length and argument  flags*/
>>>>    	uint8_t		filetype;	/* filetype of inode for directories */
>>>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>>>> index 1887605..9a649d1 100644
>>>> --- a/fs/xfs/scrub/common.c
>>>> +++ b/fs/xfs/scrub/common.c
>>>> @@ -24,6 +24,8 @@
>>>>    #include "xfs_rmap_btree.h"
>>>>    #include "xfs_log.h"
>>>>    #include "xfs_trans_priv.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_reflink.h"
>>>>    #include "scrub/scrub.h"
>>>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>>>> index 42ac847..d65e6d8 100644
>>>> --- a/fs/xfs/xfs_acl.c
>>>> +++ b/fs/xfs/xfs_acl.c
>>>> @@ -10,6 +10,8 @@
>>>>    #include "xfs_trans_resv.h"
>>>>    #include "xfs_mount.h"
>>>>    #include "xfs_inode.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_trace.h"
>>>>    #include "xfs_error.h"
>>>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>>>> index d37743b..881b9a4 100644
>>>> --- a/fs/xfs/xfs_attr_list.c
>>>> +++ b/fs/xfs/xfs_attr_list.c
>>>> @@ -12,6 +12,7 @@
>>>>    #include "xfs_trans_resv.h"
>>>>    #include "xfs_mount.h"
>>>>    #include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_inode.h"
>>>>    #include "xfs_trans.h"
>>>>    #include "xfs_bmap.h"
>>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>>> index 28c07c9..7c1d9da 100644
>>>> --- a/fs/xfs/xfs_ioctl.c
>>>> +++ b/fs/xfs/xfs_ioctl.c
>>>> @@ -15,6 +15,8 @@
>>>>    #include "xfs_iwalk.h"
>>>>    #include "xfs_itable.h"
>>>>    #include "xfs_error.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_bmap.h"
>>>>    #include "xfs_bmap_util.h"
>>>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>>>> index 769581a..d504f8f 100644
>>>> --- a/fs/xfs/xfs_ioctl32.c
>>>> +++ b/fs/xfs/xfs_ioctl32.c
>>>> @@ -17,6 +17,8 @@
>>>>    #include "xfs_itable.h"
>>>>    #include "xfs_fsops.h"
>>>>    #include "xfs_rtalloc.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_ioctl.h"
>>>>    #include "xfs_ioctl32.h"
>>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>>> index e85bbf5..a2d299f 100644
>>>> --- a/fs/xfs/xfs_iops.c
>>>> +++ b/fs/xfs/xfs_iops.c
>>>> @@ -13,6 +13,8 @@
>>>>    #include "xfs_inode.h"
>>>>    #include "xfs_acl.h"
>>>>    #include "xfs_quota.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_trans.h"
>>>>    #include "xfs_trace.h"
>>>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>>>> index 74133a5..d8dc72d 100644
>>>> --- a/fs/xfs/xfs_xattr.c
>>>> +++ b/fs/xfs/xfs_xattr.c
>>>> @@ -10,6 +10,7 @@
>>>>    #include "xfs_log_format.h"
>>>>    #include "xfs_da_format.h"
>>>>    #include "xfs_inode.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_acl.h"
>>>> -- 
>>>> 2.7.4
>>>>
>>>
>>
>
Brian Foster Feb. 26, 2020, 1:48 p.m. UTC | #9
On Tue, Feb 25, 2020 at 10:36:18PM -0700, Allison Collins wrote:
> 
> 
> On 2/25/20 6:34 AM, Brian Foster wrote:
> > On Mon, Feb 24, 2020 at 04:14:48PM -0700, Allison Collins wrote:
> > > On 2/24/20 8:25 AM, Brian Foster wrote:
> > > > On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
> > > > > This patch modifies the attr remove routines to be delay ready. This means they no
> > > > > longer roll or commit transactions, but instead return -EAGAIN to have the calling
> > > > > routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
> > > > > become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
> > > > > track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
> > > > > been modified to use the switch, and a  new version of xfs_attr_remove_args
> > > > > consists of a simple loop to refresh the transaction until the operation is
> > > > > completed.
> > > > > 
> > > > > This patch also adds a new struct xfs_delattr_context, which we will use to keep
> > > > > track of the current state of an attribute operation. The new xfs_delattr_state
> > > > > enum is used to track various operations that are in progress so that we know not
> > > > > to repeat them, and resume where we left off before EAGAIN was returned to cycle
> > > > > out the transaction. Other members take the place of local variables that need
> > > > > to retain their values across multiple function recalls.
> > > > > 
> > > > > Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
> > > > > indicate places where the function would return -EAGAIN, and then immediately
> > > > > resume from after being recalled by the calling function.  States marked as a
> > > > > "subroutine state" indicate that they belong to a subroutine, and so the calling
> > > > > function needs to pass them back to that subroutine to allow it to finish where
> > > > > it left off. But they otherwise do not have a role in the calling function other
> > > > > than just passing through.
> > > > > 
> > > > >    xfs_attr_remove_iter()
> > > > >            XFS_DAS_RM_SHRINK     ─┐
> > > > >            (subroutine state)     │
> > > > >                                   │
> > > > >            XFS_DAS_RMTVAL_REMOVE ─┤
> > > > >            (subroutine state)     │
> > > > >                                   └─>xfs_attr_node_removename()
> > > > >                                                    │
> > > > >                                                    v
> > > > >                                            need to remove
> > > > >                                      ┌─n──  rmt blocks?
> > > > >                                      │             │
> > > > >                                      │             y
> > > > >                                      │             │
> > > > >                                      │             v
> > > > >                                      │  ┌─>XFS_DAS_RMTVAL_REMOVE
> > > > >                                      │  │          │
> > > > >                                      │  │          v
> > > > >                                      │  └──y── more blks
> > > > >                                      │         to remove?
> > > > >                                      │             │
> > > > >                                      │             n
> > > > >                                      │             │
> > > > >                                      │             v
> > > > >                                      │         need to
> > > > >                                      └─────> shrink tree? ─n─┐
> > > > >                                                    │         │
> > > > >                                                    y         │
> > > > >                                                    │         │
> > > > >                                                    v         │
> > > > >                                            XFS_DAS_RM_SHRINK │
> > > > >                                                    │         │
> > > > >                                                    v         │
> > > > >                                                   done <─────┘
> > > > > 
> > > > 
> > > > Wow. :P I guess I have nothing against verbose commit logs, but I wonder
> > > > how useful this level of documentation is for a patch that shouldn't
> > > > really change the existing flow of the operation.
> > > 
> > > Yes Darrick had requested a diagram in the last review, so I had put this
> > > together.  I wasnt sure where the best place to put it even was, so I put it
> > > here at least for now.  I have no idea if there is a limit on commit message
> > > length, but if there is, I'm pretty sure I blew right past it in this patch
> > > and the next.  Maybe if anything it can just be here for now while we work
> > > through things?
> > > 
> > 
> > No problem.. if it's useful it's good to have a record of out around
> > somewhere until the end result is more stabilized and we can determine
> > whether this warrants a permanent home somewhere in the code.
> > 
> > > > 
> > > > > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > > > > ---
> > > > >    fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
> > > > >    fs/xfs/libxfs/xfs_attr.h     |   1 +
> > > > >    fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
> > > > >    fs/xfs/scrub/common.c        |   2 +
> > > > >    fs/xfs/xfs_acl.c             |   2 +
> > > > >    fs/xfs/xfs_attr_list.c       |   1 +
> > > > >    fs/xfs/xfs_ioctl.c           |   2 +
> > > > >    fs/xfs/xfs_ioctl32.c         |   2 +
> > > > >    fs/xfs/xfs_iops.c            |   2 +
> > > > >    fs/xfs/xfs_xattr.c           |   1 +
> > > > >    10 files changed, 141 insertions(+), 16 deletions(-)
> > > > > 
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > > index 5d73bdf..cd3a3f7 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > > @@ -368,11 +368,60 @@ xfs_has_attr(
> > > > >     */
> > > > >    int
> > > > >    xfs_attr_remove_args(
> > > > > +	struct xfs_da_args	*args)
> > > > > +{
> > > > > +	int			error = 0;
> > > > > +	int			err2 = 0;
> > > > > +
> > > > > +	do {
> > > > > +		error = xfs_attr_remove_iter(args);
> > > > > +		if (error && error != -EAGAIN)
> > > > > +			goto out;
> > > > > +
> > > > 
> > > > I'm a little confused on the logic of this loop given that the only
> > > > caller commits the transaction (which also finishes dfops). IOW, it
> > > > seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
> > > > that is the case, this can be simplified to something like:
> > > Well, we need to do it when error == -EAGAIN or 0, right? Which I think
> > > better imitates the defer_finish routines.  That's why a lot of the existing
> > > code that just finishes off with a transaction just sort of gets sawed off
> > > at the end. Otherwise they would need one more state just to return -EAGAIN
> > > as the last thing they have to do. Did that make sense?
> > > 
> > 
> > Hmm.. I could just be missing something or not far along enough in the
> > series. Can you point me at an example of where we need to finish/roll
> > before the caller of xfs_attr_remove_args() commits the transaction?
> > 
> Ok, in looking for an example, realized all such examples appear in the next
> patch ;-)  So maybe we can get away with simplifying it in this patch.
> 

Ah, Ok. Yeah, I think that would be best so long as it is correct, since
right now at least we have separate xfs_attr_[set|remove]_args() loop
functions and I didn't see any code that warranted the extra roll in the
remove path.

> For the next patch though, it's any place the roll/finish disappears, and an
> "return -EAGAIN" does not.  For example, at the end of
> xfs_attr_leaf_addname.
> 

I see, thanks. Hmmm... so I think that particular example is basically a
programming pattern thing moreso than a functional requirement. I.e.,
the current _clearflag() function clears the flag and rolls the
transaction perhaps simply so it can be reliably used in different
contexts. The use in the _addname() case is functionally spurious afaict
because we roll the transaction only to make no further changes and then
commit the final transaction in the higher level code.

I could see leaving the loop as is if this were the case for every exit
path back to xfs_attr_set_args(), but is that really the case? If not,
haven't we introduced a spurious roll for any zero return back to the
_args() function? I think it might be best to fix up the loop to not
roll on error == 0, explicitly plumb in the -EAGAIN in those spurious
cases like _addname() where we currently roll, and then come up with a
follow up patch to remove the ones that end up as spurious. That way
we're not conflating too much refactoring with functional change and can
review/document the functional change independently (i.e., if removing
one of those rolls ends up introducing a bug, we don't have to revert an
entire refactoring patch to restore original behavior).

Now that I think of it, the better option is probably to remove the
xfs_trans_roll_inode() call from _addname() first, before these patches
introduce the delay ready infrastructure, since it's already isolated as
spurious at that point. That should be a simple patch with a
clear/obvious explanation.

Brian

> > > > 
> > > > int
> > > > xfs_attr_remove_args(
> > > >           struct xfs_da_args      *args)
> > > > {
> > > >           int                     error;
> > > > 
> > > >           do {
> > > >                   error = xfs_attr_remove_iter(args);
> > > >                   if (error != -EAGAIN)
> > > >                           break;
> > > > 
> > > >                   if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> > > >                           args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> > > >                           error = xfs_defer_finish(&args->trans);
> > > >                           if (error)
> > > >                                   break;
> > > >                   }
> > > > 
> > > >                   error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > >                   if (error)
> > > >                           break;
> > > >           } while (true);
> > > > 
> > > >           return error;
> > > > }
> > > > 
> > > > That has the added benefit of eliminating the whole err2 pattern, which
> > > > always strikes me as a landmine.
> > > > 
> > > > > +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> > > > 
> > > > BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
> > > > operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?
> > > Sure, will update
> > > 
> > > > 
> > > > > +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> > > > > +
> > > > > +			err2 = xfs_defer_finish(&args->trans);
> > > > > +			if (err2) {
> > > > > +				error = err2;
> > > > > +				goto out;
> > > > > +			}
> > > > > +		}
> > > > > +
> > > > > +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > > +		if (err2) {
> > > > > +			error = err2;
> > > > > +			goto out;
> > > > > +		}
> > > > > +
> > > > > +	} while (error == -EAGAIN);
> > > > > +out:
> > > > > +	return error;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Remove the attribute specified in @args.
> > > > > + *
> > > > > + * This function may return -EAGAIN to signal that the transaction needs to be
> > > > > + * rolled.  Callers should continue calling this function until they receive a
> > > > > + * return value other than -EAGAIN.
> > > > > + */
> > > > > +int
> > > > > +xfs_attr_remove_iter(
> > > > >    	struct xfs_da_args      *args)
> > > > >    {
> > > > >    	struct xfs_inode	*dp = args->dp;
> > > > >    	int			error;
> > > > > +	/* State machine switch */
> > > > > +	switch (args->dac.dela_state) {
> > > > > +	case XFS_DAS_RM_SHRINK:
> > > > > +	case XFS_DAS_RMTVAL_REMOVE:
> > > > > +		goto node;
> > > > > +	default:
> > > > > +		break;
> > > > > +	}
> > > > > +
> > > > >    	if (!xfs_inode_hasattr(dp)) {
> > > > >    		error = -ENOATTR;
> > > > >    	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > > > > @@ -381,6 +430,7 @@ xfs_attr_remove_args(
> > > > >    	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > > > >    		error = xfs_attr_leaf_removename(args);
> > > > >    	} else {
> > > > > +node:
> > > > >    		error = xfs_attr_node_removename(args);
> > > > >    	}
> > > > > @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
> > > > >    		/* bp is gone due to xfs_da_shrink_inode */
> > > > >    		if (error)
> > > > >    			return error;
> > > > > -		error = xfs_defer_finish(&args->trans);
> > > > > -		if (error)
> > > > > -			return error;
> > > > > +
> > > > > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> > > > >    	}
> > > > >    	return 0;
> > > > >    }
> > > > > @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
> > > > >     * This will involve walking down the Btree, and may involve joining
> > > > >     * leaf nodes and even joining intermediate nodes up to and including
> > > > >     * the root node (a special case of an intermediate node).
> > > > > + *
> > > > > + * This routine is meant to function as either an inline or delayed operation,
> > > > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > > > + * functions will need to handle this, and recall the function until a
> > > > > + * successful error code is returned.
> > > > >     */
> > > > >    STATIC int
> > > > >    xfs_attr_node_removename(
> > > > > @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
> > > > >    	struct xfs_inode	*dp = args->dp;
> > > > >    	trace_xfs_attr_node_removename(args);
> > > > > +	state = args->dac.da_state;
> > > > > +	blk = args->dac.blk;
> > > > > +
> > > > > +	/* State machine switch */
> > > > > +	switch (args->dac.dela_state) {
> > > > > +	case XFS_DAS_RMTVAL_REMOVE:
> > > > > +		goto rm_node_blks;
> > > > > +	case XFS_DAS_RM_SHRINK:
> > > > > +		goto rm_shrink;
> > > > > +	default:
> > > > > +		break;
> > > > > +	}
> > > > >    	error = xfs_attr_node_hasname(args, &state);
> > > > >    	if (error != -EEXIST)
> > > > >    		goto out;
> > > > > +	else
> > > > > +		error = 0;
> > > > 
> > > > This doesn't look necessary.
> > > Well, at this point error has to be -EEXIST.  Which is great because we need
> > > the attr to exist, but we dont want to return that as error for this
> > > function.  Which can happen if error is not otherwise set.
> > > 
> > 
> > AFAICT every codepath after this assigns error one way or another before
> > it's returned. There's another error = 0 assignment just before the out:
> > label.
> Ok, I see it.  Will remove.
> 
> > 
> > > > 
> > > > >    	/*
> > > > >    	 * If there is an out-of-line value, de-allocate the blocks.
> > > > > @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
> > > > >    	blk = &state->path.blk[ state->path.active-1 ];
> > > > >    	ASSERT(blk->bp != NULL);
> > > > >    	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> > > > > +
> > > > > +	/*
> > > > > +	 * Store blk and state in the context incase we need to cycle out the
> > > > > +	 * transaction
> > > > > +	 */
> > > > > +	args->dac.blk = blk;
> > > > > +	args->dac.da_state = state;
> > > > > +
> > > > >    	if (args->rmtblkno > 0) {
> > > > >    		/*
> > > > >    		 * Fill in disk block numbers in the state structure
> > > > > @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
> > > > >    		if (error)
> > > > >    			goto out;
> > > > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > > +		error = xfs_attr_rmtval_invalidate(args);
> > > > 
> > > > Remind me why we lose the above trans roll? I vaguely recall that this
> > > > was intentional, but I could be mistaken...
> > > I think we removed it in v5.  We used to have a  XFS_DAS_RM_INVALIDATE
> > > state, but then we reasoned that because these are just in-core changes, we
> > > didnt need it, so we eliminated this state entirely.
> > > 
> > > Maybe i just add a comment here?  Just as a reminder
> > > 
> > 
> > Ah, Ok. Normally I'd say document things like this in the commit log so
> > we don't lose track, though I don't know how much space we have there.
> > ;)
> Ok, I'll see if I can squeeze in a few more lines :-)
> 
> > 
> > > > 
> > > > >    		if (error)
> > > > >    			goto out;
> > > > > +	}
> > > > > -		error = xfs_attr_rmtval_remove(args);
> > > > > -		if (error)
> > > > > -			goto out;
> > > > > +rm_node_blks:
> > > > > +
> > > > > +	if (args->rmtblkno > 0) {
> > > > > +		error = xfs_attr_rmtval_unmap(args);
> > > > > +
> > > > > +		if (error) {
> > > > > +			if (error == -EAGAIN)
> > > > > +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
> > > > 
> > > > Might be helpful for the code labels to match the state names. I.e., use
> > > > das_rmtval_remove: for the label above.
> > > Sure, I can update add the das prefix.
> > > 
> > > > 
> > > > > +			return error;
> > > > > +		}
> > > > >    		/*
> > > > >    		 * Refill the state structure with buffers, the prior calls
> > > > > @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
> > > > >    		error = xfs_da3_join(state);
> > > > >    		if (error)
> > > > >    			goto out;
> > > > > -		error = xfs_defer_finish(&args->trans);
> > > > > -		if (error)
> > > > > -			goto out;
> > > > > -		/*
> > > > > -		 * Commit the Btree join operation and start a new trans.
> > > > > -		 */
> > > > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > > > -		if (error)
> > > > > -			goto out;
> > > > > +
> > > > > +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
> > > > > +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > > > > +		return -EAGAIN;
> > > > >    	}
> > > > > +rm_shrink:
> > > > > +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
> > > > > +
> > > > 
> > > > There's an xfs_defer_finish() call further down this function. Should
> > > > that be replaced with the flag?
> > > > 
> > > > Finally, I mentioned in a previous review that this function should
> > > > probably be further broken down before fitting in the state management
> > > > stuff. It doesn't look like that happened so I've attached a diff that
> > > > is just intended to give an idea of what I mean by sectioning off the
> > > > hunks that might be able to break down into helpers. The helpers
> > > > wouldn't contain any state management, so we create a clear separation
> > > > between the state code and functional components.
> > > Yes, it's xfs_attr_node_shrink in patch 15.  I moved it to another patch to
> > > try and keep the activity in this one to a minimum.  Apologies if it
> > > surprised you!  And then i mistakenly had taken the XFS_DAC_FINISH_TRANS
> > > flag with it.  I meant to keep all the state machine stuff here.  Will fix!
> > > 
> > 
> > Ok, I might have just not got there yet.
> > 
> > > I think this initial
> > > > refactoring would make the introduction of state much more simple
> > > 
> > > I guess I didn't think people would be partial to introducing helpers before
> > > or after the state logic.  I put them after in this set because the states
> > > are visible now, so I though it would make the goal of modularizing code
> > > between the states more clear to folks.  Do you think I should move it back
> > > behind the state machine patches?
> > > 
> > 
> > I do think the refactoring should be done first. This does make it more
> > challenging for the developer (IMO) because I know I'd probably have to
> > hack around with the state bits to have a better idea of how to refactor
> > things in some cases, and then go back and retrofit the refactoring.
> > 
> > The advantage is that the heavy lifting in this series becomes agnostic
> > to the state bits. Refactoring patches are easier to review and we can
> > make progress because there's less of a need to carry those out of tree
> > through however many versions of the state code we'll need before
> > getting it merged. Once the code is sufficiently factored, the state
> > code should be much simpler to introduce and review since we hopefully
> > won't be jumping around into the middle of functions, multiple branches
> > of logic deep, etc.
> > 
> > (I see Dave commented similarly on a couple of the subsequent patches. I
> > 100% agree with the approach he describes there and that is similar to
> > what I was trying to describe with the diff I attached in my earlier
> > mail...)
> > 
> > Brian
> 
> Alrighty then, will move back.  Thanks, and thanks again for the reviews!!
> 
> Allison
> 
> > 
> > > (and
> > > > perhaps alleviate the need for the huge diagram).
> > > Well, I get the impression that people find the series sort of scary and
> > > maybe the diagrams help them a bit.  Maybe we can take them out later after
> > > people feel like they are comfortable with things?
> > > 
> > > It might also be
> > > > interesting to see how much of the result could be folded up further
> > > > into _removename_iter()...
> > > 
> > > Yes, I think that is the goal we're reaching for.  I will add the other
> > > helpers I see in your diff too.
> > > 
> > > Thanks for the reviews!
> > > Allison
> > > 
> > > > 
> > > > Brian
> > > > 
> > > > >    	/*
> > > > >    	 * If the result is small enough, push it all into the inode.
> > > > >    	 */
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > > index ce7b039..ea873a5 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > > @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
> > > > >    int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
> > > > >    int xfs_has_attr(struct xfs_da_args *args);
> > > > >    int xfs_attr_remove_args(struct xfs_da_args *args);
> > > > > +int xfs_attr_remove_iter(struct xfs_da_args *args);
> > > > >    int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
> > > > >    		  int flags, struct attrlist_cursor_kern *cursor);
> > > > >    bool xfs_attr_namecheck(const void *name, size_t length);
> > > > > diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> > > > > index 14f1be3..3c78498 100644
> > > > > --- a/fs/xfs/libxfs/xfs_da_btree.h
> > > > > +++ b/fs/xfs/libxfs/xfs_da_btree.h
> > > > > @@ -50,9 +50,39 @@ enum xfs_dacmp {
> > > > >    };
> > > > >    /*
> > > > > + * Enum values for xfs_delattr_context.da_state
> > > > > + *
> > > > > + * These values are used by delayed attribute operations to keep track  of where
> > > > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > > > + * calling function to roll the transaction, and then recall the subroutine to
> > > > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > > > + * to where it was and resume executing where it left off.
> > > > > + */
> > > > > +enum xfs_delattr_state {
> > > > > +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
> > > > > +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * Defines for xfs_delattr_context.flags
> > > > > + */
> > > > > +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
> > > > > +
> > > > > +/*
> > > > > + * Context used for keeping track of delayed attribute operations
> > > > > + */
> > > > > +struct xfs_delattr_context {
> > > > > +	struct xfs_da_state	*da_state;
> > > > > +	struct xfs_da_state_blk *blk;
> > > > > +	unsigned int		flags;
> > > > > +	enum xfs_delattr_state	dela_state;
> > > > > +};
> > > > > +
> > > > > +/*
> > > > >     * Structure to ease passing around component names.
> > > > >     */
> > > > >    typedef struct xfs_da_args {
> > > > > +	struct xfs_delattr_context dac; /* context used for delay attr ops */
> > > > >    	struct xfs_da_geometry *geo;	/* da block geometry */
> > > > >    	struct xfs_name	name;		/* name, length and argument  flags*/
> > > > >    	uint8_t		filetype;	/* filetype of inode for directories */
> > > > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > > > index 1887605..9a649d1 100644
> > > > > --- a/fs/xfs/scrub/common.c
> > > > > +++ b/fs/xfs/scrub/common.c
> > > > > @@ -24,6 +24,8 @@
> > > > >    #include "xfs_rmap_btree.h"
> > > > >    #include "xfs_log.h"
> > > > >    #include "xfs_trans_priv.h"
> > > > > +#include "xfs_da_format.h"
> > > > > +#include "xfs_da_btree.h"
> > > > >    #include "xfs_attr.h"
> > > > >    #include "xfs_reflink.h"
> > > > >    #include "scrub/scrub.h"
> > > > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > > > > index 42ac847..d65e6d8 100644
> > > > > --- a/fs/xfs/xfs_acl.c
> > > > > +++ b/fs/xfs/xfs_acl.c
> > > > > @@ -10,6 +10,8 @@
> > > > >    #include "xfs_trans_resv.h"
> > > > >    #include "xfs_mount.h"
> > > > >    #include "xfs_inode.h"
> > > > > +#include "xfs_da_format.h"
> > > > > +#include "xfs_da_btree.h"
> > > > >    #include "xfs_attr.h"
> > > > >    #include "xfs_trace.h"
> > > > >    #include "xfs_error.h"
> > > > > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > > > > index d37743b..881b9a4 100644
> > > > > --- a/fs/xfs/xfs_attr_list.c
> > > > > +++ b/fs/xfs/xfs_attr_list.c
> > > > > @@ -12,6 +12,7 @@
> > > > >    #include "xfs_trans_resv.h"
> > > > >    #include "xfs_mount.h"
> > > > >    #include "xfs_da_format.h"
> > > > > +#include "xfs_da_btree.h"
> > > > >    #include "xfs_inode.h"
> > > > >    #include "xfs_trans.h"
> > > > >    #include "xfs_bmap.h"
> > > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > > > index 28c07c9..7c1d9da 100644
> > > > > --- a/fs/xfs/xfs_ioctl.c
> > > > > +++ b/fs/xfs/xfs_ioctl.c
> > > > > @@ -15,6 +15,8 @@
> > > > >    #include "xfs_iwalk.h"
> > > > >    #include "xfs_itable.h"
> > > > >    #include "xfs_error.h"
> > > > > +#include "xfs_da_format.h"
> > > > > +#include "xfs_da_btree.h"
> > > > >    #include "xfs_attr.h"
> > > > >    #include "xfs_bmap.h"
> > > > >    #include "xfs_bmap_util.h"
> > > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > > > index 769581a..d504f8f 100644
> > > > > --- a/fs/xfs/xfs_ioctl32.c
> > > > > +++ b/fs/xfs/xfs_ioctl32.c
> > > > > @@ -17,6 +17,8 @@
> > > > >    #include "xfs_itable.h"
> > > > >    #include "xfs_fsops.h"
> > > > >    #include "xfs_rtalloc.h"
> > > > > +#include "xfs_da_format.h"
> > > > > +#include "xfs_da_btree.h"
> > > > >    #include "xfs_attr.h"
> > > > >    #include "xfs_ioctl.h"
> > > > >    #include "xfs_ioctl32.h"
> > > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > > index e85bbf5..a2d299f 100644
> > > > > --- a/fs/xfs/xfs_iops.c
> > > > > +++ b/fs/xfs/xfs_iops.c
> > > > > @@ -13,6 +13,8 @@
> > > > >    #include "xfs_inode.h"
> > > > >    #include "xfs_acl.h"
> > > > >    #include "xfs_quota.h"
> > > > > +#include "xfs_da_format.h"
> > > > > +#include "xfs_da_btree.h"
> > > > >    #include "xfs_attr.h"
> > > > >    #include "xfs_trans.h"
> > > > >    #include "xfs_trace.h"
> > > > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > > > > index 74133a5..d8dc72d 100644
> > > > > --- a/fs/xfs/xfs_xattr.c
> > > > > +++ b/fs/xfs/xfs_xattr.c
> > > > > @@ -10,6 +10,7 @@
> > > > >    #include "xfs_log_format.h"
> > > > >    #include "xfs_da_format.h"
> > > > >    #include "xfs_inode.h"
> > > > > +#include "xfs_da_btree.h"
> > > > >    #include "xfs_attr.h"
> > > > >    #include "xfs_acl.h"
> > > > > -- 
> > > > > 2.7.4
> > > > > 
> > > > 
> > > 
> > 
>
Allison Henderson Feb. 26, 2020, 7:23 p.m. UTC | #10
On 2/26/20 6:48 AM, Brian Foster wrote:
> On Tue, Feb 25, 2020 at 10:36:18PM -0700, Allison Collins wrote:
>>
>>
>> On 2/25/20 6:34 AM, Brian Foster wrote:
>>> On Mon, Feb 24, 2020 at 04:14:48PM -0700, Allison Collins wrote:
>>>> On 2/24/20 8:25 AM, Brian Foster wrote:
>>>>> On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
>>>>>> This patch modifies the attr remove routines to be delay ready. This means they no
>>>>>> longer roll or commit transactions, but instead return -EAGAIN to have the calling
>>>>>> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
>>>>>> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
>>>>>> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
>>>>>> been modified to use the switch, and a  new version of xfs_attr_remove_args
>>>>>> consists of a simple loop to refresh the transaction until the operation is
>>>>>> completed.
>>>>>>
>>>>>> This patch also adds a new struct xfs_delattr_context, which we will use to keep
>>>>>> track of the current state of an attribute operation. The new xfs_delattr_state
>>>>>> enum is used to track various operations that are in progress so that we know not
>>>>>> to repeat them, and resume where we left off before EAGAIN was returned to cycle
>>>>>> out the transaction. Other members take the place of local variables that need
>>>>>> to retain their values across multiple function recalls.
>>>>>>
>>>>>> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
>>>>>> indicate places where the function would return -EAGAIN, and then immediately
>>>>>> resume from after being recalled by the calling function.  States marked as a
>>>>>> "subroutine state" indicate that they belong to a subroutine, and so the calling
>>>>>> function needs to pass them back to that subroutine to allow it to finish where
>>>>>> it left off. But they otherwise do not have a role in the calling function other
>>>>>> than just passing through.
>>>>>>
>>>>>>     xfs_attr_remove_iter()
>>>>>>             XFS_DAS_RM_SHRINK     ─┐
>>>>>>             (subroutine state)     │
>>>>>>                                    │
>>>>>>             XFS_DAS_RMTVAL_REMOVE ─┤
>>>>>>             (subroutine state)     │
>>>>>>                                    └─>xfs_attr_node_removename()
>>>>>>                                                     │
>>>>>>                                                     v
>>>>>>                                             need to remove
>>>>>>                                       ┌─n──  rmt blocks?
>>>>>>                                       │             │
>>>>>>                                       │             y
>>>>>>                                       │             │
>>>>>>                                       │             v
>>>>>>                                       │  ┌─>XFS_DAS_RMTVAL_REMOVE
>>>>>>                                       │  │          │
>>>>>>                                       │  │          v
>>>>>>                                       │  └──y── more blks
>>>>>>                                       │         to remove?
>>>>>>                                       │             │
>>>>>>                                       │             n
>>>>>>                                       │             │
>>>>>>                                       │             v
>>>>>>                                       │         need to
>>>>>>                                       └─────> shrink tree? ─n─┐
>>>>>>                                                     │         │
>>>>>>                                                     y         │
>>>>>>                                                     │         │
>>>>>>                                                     v         │
>>>>>>                                             XFS_DAS_RM_SHRINK │
>>>>>>                                                     │         │
>>>>>>                                                     v         │
>>>>>>                                                    done <─────┘
>>>>>>
>>>>>
>>>>> Wow. :P I guess I have nothing against verbose commit logs, but I wonder
>>>>> how useful this level of documentation is for a patch that shouldn't
>>>>> really change the existing flow of the operation.
>>>>
>>>> Yes Darrick had requested a diagram in the last review, so I had put this
>>>> together.  I wasnt sure where the best place to put it even was, so I put it
>>>> here at least for now.  I have no idea if there is a limit on commit message
>>>> length, but if there is, I'm pretty sure I blew right past it in this patch
>>>> and the next.  Maybe if anything it can just be here for now while we work
>>>> through things?
>>>>
>>>
>>> No problem.. if it's useful it's good to have a record of out around
>>> somewhere until the end result is more stabilized and we can determine
>>> whether this warrants a permanent home somewhere in the code.
>>>
>>>>>
>>>>>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>>>>>> ---
>>>>>>     fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>>>>>>     fs/xfs/libxfs/xfs_attr.h     |   1 +
>>>>>>     fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>>>>>>     fs/xfs/scrub/common.c        |   2 +
>>>>>>     fs/xfs/xfs_acl.c             |   2 +
>>>>>>     fs/xfs/xfs_attr_list.c       |   1 +
>>>>>>     fs/xfs/xfs_ioctl.c           |   2 +
>>>>>>     fs/xfs/xfs_ioctl32.c         |   2 +
>>>>>>     fs/xfs/xfs_iops.c            |   2 +
>>>>>>     fs/xfs/xfs_xattr.c           |   1 +
>>>>>>     10 files changed, 141 insertions(+), 16 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>>>> index 5d73bdf..cd3a3f7 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>>>> @@ -368,11 +368,60 @@ xfs_has_attr(
>>>>>>      */
>>>>>>     int
>>>>>>     xfs_attr_remove_args(
>>>>>> +	struct xfs_da_args	*args)
>>>>>> +{
>>>>>> +	int			error = 0;
>>>>>> +	int			err2 = 0;
>>>>>> +
>>>>>> +	do {
>>>>>> +		error = xfs_attr_remove_iter(args);
>>>>>> +		if (error && error != -EAGAIN)
>>>>>> +			goto out;
>>>>>> +
>>>>>
>>>>> I'm a little confused on the logic of this loop given that the only
>>>>> caller commits the transaction (which also finishes dfops). IOW, it
>>>>> seems we shouldn't ever need to finish/roll when error != -EAGAIN. If
>>>>> that is the case, this can be simplified to something like:
>>>> Well, we need to do it when error == -EAGAIN or 0, right? Which I think
>>>> better imitates the defer_finish routines.  That's why a lot of the existing
>>>> code that just finishes off with a transaction just sort of gets sawed off
>>>> at the end. Otherwise they would need one more state just to return -EAGAIN
>>>> as the last thing they have to do. Did that make sense?
>>>>
>>>
>>> Hmm.. I could just be missing something or not far along enough in the
>>> series. Can you point me at an example of where we need to finish/roll
>>> before the caller of xfs_attr_remove_args() commits the transaction?
>>>
>> Ok, in looking for an example, realized all such examples appear in the next
>> patch ;-)  So maybe we can get away with simplifying it in this patch.
>>
> 
> Ah, Ok. Yeah, I think that would be best so long as it is correct, since
> right now at least we have separate xfs_attr_[set|remove]_args() loop
> functions and I didn't see any code that warranted the extra roll in the
> remove path.
> 
>> For the next patch though, it's any place the roll/finish disappears, and an
>> "return -EAGAIN" does not.  For example, at the end of
>> xfs_attr_leaf_addname.
>>
> 
> I see, thanks. Hmmm... so I think that particular example is basically a
> programming pattern thing moreso than a functional requirement. I.e.,
> the current _clearflag() function clears the flag and rolls the
> transaction perhaps simply so it can be reliably used in different
> contexts. The use in the _addname() case is functionally spurious afaict
> because we roll the transaction only to make no further changes and then
> commit the final transaction in the higher level code.
> 
> I could see leaving the loop as is if this were the case for every exit
> path back to xfs_attr_set_args(), but is that really the case? If not,
> haven't we introduced a spurious roll for any zero return back to the
> _args() function? I think it might be best to fix up the loop to not
> roll on error == 0, explicitly plumb in the -EAGAIN in those spurious
> cases like _addname() where we currently roll, and then come up with a
> follow up patch to remove the ones that end up as spurious. That way
> we're not conflating too much refactoring with functional change and can
> review/document the functional change independently (i.e., if removing
> one of those rolls ends up introducing a bug, we don't have to revert an
> entire refactoring patch to restore original behavior).
> 
> Now that I think of it, the better option is probably to remove the
> xfs_trans_roll_inode() call from _addname() first, before these patches
> introduce the delay ready infrastructure, since it's already isolated as
> spurious at that point. That should be a simple patch with a
> clear/obvious explanation.
Alrighty then, got it.  I will add these suggestions into the next 
version.  Thanks again for all the reviews!

Allison

> 
> Brian
> 
>>>>>
>>>>> int
>>>>> xfs_attr_remove_args(
>>>>>            struct xfs_da_args      *args)
>>>>> {
>>>>>            int                     error;
>>>>>
>>>>>            do {
>>>>>                    error = xfs_attr_remove_iter(args);
>>>>>                    if (error != -EAGAIN)
>>>>>                            break;
>>>>>
>>>>>                    if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>>>>>                            args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>>>>>                            error = xfs_defer_finish(&args->trans);
>>>>>                            if (error)
>>>>>                                    break;
>>>>>                    }
>>>>>
>>>>>                    error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>>>                    if (error)
>>>>>                            break;
>>>>>            } while (true);
>>>>>
>>>>>            return error;
>>>>> }
>>>>>
>>>>> That has the added benefit of eliminating the whole err2 pattern, which
>>>>> always strikes me as a landmine.
>>>>>
>>>>>> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>>>>>
>>>>> BTW, _FINISH_TRANS also seems misnamed given that we finish deferred
>>>>> operations, not necessarily the transaction. XFS_DAC_DEFER_FINISH?
>>>> Sure, will update
>>>>
>>>>>
>>>>>> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>>>>>> +
>>>>>> +			err2 = xfs_defer_finish(&args->trans);
>>>>>> +			if (err2) {
>>>>>> +				error = err2;
>>>>>> +				goto out;
>>>>>> +			}
>>>>>> +		}
>>>>>> +
>>>>>> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
>>>>>> +		if (err2) {
>>>>>> +			error = err2;
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +	} while (error == -EAGAIN);
>>>>>> +out:
>>>>>> +	return error;
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * Remove the attribute specified in @args.
>>>>>> + *
>>>>>> + * This function may return -EAGAIN to signal that the transaction needs to be
>>>>>> + * rolled.  Callers should continue calling this function until they receive a
>>>>>> + * return value other than -EAGAIN.
>>>>>> + */
>>>>>> +int
>>>>>> +xfs_attr_remove_iter(
>>>>>>     	struct xfs_da_args      *args)
>>>>>>     {
>>>>>>     	struct xfs_inode	*dp = args->dp;
>>>>>>     	int			error;
>>>>>> +	/* State machine switch */
>>>>>> +	switch (args->dac.dela_state) {
>>>>>> +	case XFS_DAS_RM_SHRINK:
>>>>>> +	case XFS_DAS_RMTVAL_REMOVE:
>>>>>> +		goto node;
>>>>>> +	default:
>>>>>> +		break;
>>>>>> +	}
>>>>>> +
>>>>>>     	if (!xfs_inode_hasattr(dp)) {
>>>>>>     		error = -ENOATTR;
>>>>>>     	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
>>>>>> @@ -381,6 +430,7 @@ xfs_attr_remove_args(
>>>>>>     	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>>>>>     		error = xfs_attr_leaf_removename(args);
>>>>>>     	} else {
>>>>>> +node:
>>>>>>     		error = xfs_attr_node_removename(args);
>>>>>>     	}
>>>>>> @@ -895,9 +945,8 @@ xfs_attr_leaf_removename(
>>>>>>     		/* bp is gone due to xfs_da_shrink_inode */
>>>>>>     		if (error)
>>>>>>     			return error;
>>>>>> -		error = xfs_defer_finish(&args->trans);
>>>>>> -		if (error)
>>>>>> -			return error;
>>>>>> +
>>>>>> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>>>>>>     	}
>>>>>>     	return 0;
>>>>>>     }
>>>>>> @@ -1218,6 +1267,11 @@ xfs_attr_node_addname(
>>>>>>      * This will involve walking down the Btree, and may involve joining
>>>>>>      * leaf nodes and even joining intermediate nodes up to and including
>>>>>>      * the root node (a special case of an intermediate node).
>>>>>> + *
>>>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>>>> + * functions will need to handle this, and recall the function until a
>>>>>> + * successful error code is returned.
>>>>>>      */
>>>>>>     STATIC int
>>>>>>     xfs_attr_node_removename(
>>>>>> @@ -1230,10 +1284,24 @@ xfs_attr_node_removename(
>>>>>>     	struct xfs_inode	*dp = args->dp;
>>>>>>     	trace_xfs_attr_node_removename(args);
>>>>>> +	state = args->dac.da_state;
>>>>>> +	blk = args->dac.blk;
>>>>>> +
>>>>>> +	/* State machine switch */
>>>>>> +	switch (args->dac.dela_state) {
>>>>>> +	case XFS_DAS_RMTVAL_REMOVE:
>>>>>> +		goto rm_node_blks;
>>>>>> +	case XFS_DAS_RM_SHRINK:
>>>>>> +		goto rm_shrink;
>>>>>> +	default:
>>>>>> +		break;
>>>>>> +	}
>>>>>>     	error = xfs_attr_node_hasname(args, &state);
>>>>>>     	if (error != -EEXIST)
>>>>>>     		goto out;
>>>>>> +	else
>>>>>> +		error = 0;
>>>>>
>>>>> This doesn't look necessary.
>>>> Well, at this point error has to be -EEXIST.  Which is great because we need
>>>> the attr to exist, but we dont want to return that as error for this
>>>> function.  Which can happen if error is not otherwise set.
>>>>
>>>
>>> AFAICT every codepath after this assigns error one way or another before
>>> it's returned. There's another error = 0 assignment just before the out:
>>> label.
>> Ok, I see it.  Will remove.
>>
>>>
>>>>>
>>>>>>     	/*
>>>>>>     	 * If there is an out-of-line value, de-allocate the blocks.
>>>>>> @@ -1243,6 +1311,14 @@ xfs_attr_node_removename(
>>>>>>     	blk = &state->path.blk[ state->path.active-1 ];
>>>>>>     	ASSERT(blk->bp != NULL);
>>>>>>     	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Store blk and state in the context incase we need to cycle out the
>>>>>> +	 * transaction
>>>>>> +	 */
>>>>>> +	args->dac.blk = blk;
>>>>>> +	args->dac.da_state = state;
>>>>>> +
>>>>>>     	if (args->rmtblkno > 0) {
>>>>>>     		/*
>>>>>>     		 * Fill in disk block numbers in the state structure
>>>>>> @@ -1261,13 +1337,21 @@ xfs_attr_node_removename(
>>>>>>     		if (error)
>>>>>>     			goto out;
>>>>>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>>>> +		error = xfs_attr_rmtval_invalidate(args);
>>>>>
>>>>> Remind me why we lose the above trans roll? I vaguely recall that this
>>>>> was intentional, but I could be mistaken...
>>>> I think we removed it in v5.  We used to have a  XFS_DAS_RM_INVALIDATE
>>>> state, but then we reasoned that because these are just in-core changes, we
>>>> didnt need it, so we eliminated this state entirely.
>>>>
>>>> Maybe i just add a comment here?  Just as a reminder
>>>>
>>>
>>> Ah, Ok. Normally I'd say document things like this in the commit log so
>>> we don't lose track, though I don't know how much space we have there.
>>> ;)
>> Ok, I'll see if I can squeeze in a few more lines :-)
>>
>>>
>>>>>
>>>>>>     		if (error)
>>>>>>     			goto out;
>>>>>> +	}
>>>>>> -		error = xfs_attr_rmtval_remove(args);
>>>>>> -		if (error)
>>>>>> -			goto out;
>>>>>> +rm_node_blks:
>>>>>> +
>>>>>> +	if (args->rmtblkno > 0) {
>>>>>> +		error = xfs_attr_rmtval_unmap(args);
>>>>>> +
>>>>>> +		if (error) {
>>>>>> +			if (error == -EAGAIN)
>>>>>> +				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
>>>>>
>>>>> Might be helpful for the code labels to match the state names. I.e., use
>>>>> das_rmtval_remove: for the label above.
>>>> Sure, I can update add the das prefix.
>>>>
>>>>>
>>>>>> +			return error;
>>>>>> +		}
>>>>>>     		/*
>>>>>>     		 * Refill the state structure with buffers, the prior calls
>>>>>> @@ -1293,17 +1377,15 @@ xfs_attr_node_removename(
>>>>>>     		error = xfs_da3_join(state);
>>>>>>     		if (error)
>>>>>>     			goto out;
>>>>>> -		error = xfs_defer_finish(&args->trans);
>>>>>> -		if (error)
>>>>>> -			goto out;
>>>>>> -		/*
>>>>>> -		 * Commit the Btree join operation and start a new trans.
>>>>>> -		 */
>>>>>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>>>>>> -		if (error)
>>>>>> -			goto out;
>>>>>> +
>>>>>> +		args->dac.flags |= XFS_DAC_FINISH_TRANS;
>>>>>> +		args->dac.dela_state = XFS_DAS_RM_SHRINK;
>>>>>> +		return -EAGAIN;
>>>>>>     	}
>>>>>> +rm_shrink:
>>>>>> +	args->dac.dela_state = XFS_DAS_RM_SHRINK;
>>>>>> +
>>>>>
>>>>> There's an xfs_defer_finish() call further down this function. Should
>>>>> that be replaced with the flag?
>>>>>
>>>>> Finally, I mentioned in a previous review that this function should
>>>>> probably be further broken down before fitting in the state management
>>>>> stuff. It doesn't look like that happened so I've attached a diff that
>>>>> is just intended to give an idea of what I mean by sectioning off the
>>>>> hunks that might be able to break down into helpers. The helpers
>>>>> wouldn't contain any state management, so we create a clear separation
>>>>> between the state code and functional components.
>>>> Yes, it's xfs_attr_node_shrink in patch 15.  I moved it to another patch to
>>>> try and keep the activity in this one to a minimum.  Apologies if it
>>>> surprised you!  And then i mistakenly had taken the XFS_DAC_FINISH_TRANS
>>>> flag with it.  I meant to keep all the state machine stuff here.  Will fix!
>>>>
>>>
>>> Ok, I might have just not got there yet.
>>>
>>>> I think this initial
>>>>> refactoring would make the introduction of state much more simple
>>>>
>>>> I guess I didn't think people would be partial to introducing helpers before
>>>> or after the state logic.  I put them after in this set because the states
>>>> are visible now, so I though it would make the goal of modularizing code
>>>> between the states more clear to folks.  Do you think I should move it back
>>>> behind the state machine patches?
>>>>
>>>
>>> I do think the refactoring should be done first. This does make it more
>>> challenging for the developer (IMO) because I know I'd probably have to
>>> hack around with the state bits to have a better idea of how to refactor
>>> things in some cases, and then go back and retrofit the refactoring.
>>>
>>> The advantage is that the heavy lifting in this series becomes agnostic
>>> to the state bits. Refactoring patches are easier to review and we can
>>> make progress because there's less of a need to carry those out of tree
>>> through however many versions of the state code we'll need before
>>> getting it merged. Once the code is sufficiently factored, the state
>>> code should be much simpler to introduce and review since we hopefully
>>> won't be jumping around into the middle of functions, multiple branches
>>> of logic deep, etc.
>>>
>>> (I see Dave commented similarly on a couple of the subsequent patches. I
>>> 100% agree with the approach he describes there and that is similar to
>>> what I was trying to describe with the diff I attached in my earlier
>>> mail...)
>>>
>>> Brian
>>
>> Alrighty then, will move back.  Thanks, and thanks again for the reviews!!
>>
>> Allison
>>
>>>
>>>> (and
>>>>> perhaps alleviate the need for the huge diagram).
>>>> Well, I get the impression that people find the series sort of scary and
>>>> maybe the diagrams help them a bit.  Maybe we can take them out later after
>>>> people feel like they are comfortable with things?
>>>>
>>>> It might also be
>>>>> interesting to see how much of the result could be folded up further
>>>>> into _removename_iter()...
>>>>
>>>> Yes, I think that is the goal we're reaching for.  I will add the other
>>>> helpers I see in your diff too.
>>>>
>>>> Thanks for the reviews!
>>>> Allison
>>>>
>>>>>
>>>>> Brian
>>>>>
>>>>>>     	/*
>>>>>>     	 * If the result is small enough, push it all into the inode.
>>>>>>     	 */
>>>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>>>> index ce7b039..ea873a5 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>>>> @@ -155,6 +155,7 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>>>>>>     int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
>>>>>>     int xfs_has_attr(struct xfs_da_args *args);
>>>>>>     int xfs_attr_remove_args(struct xfs_da_args *args);
>>>>>> +int xfs_attr_remove_iter(struct xfs_da_args *args);
>>>>>>     int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
>>>>>>     		  int flags, struct attrlist_cursor_kern *cursor);
>>>>>>     bool xfs_attr_namecheck(const void *name, size_t length);
>>>>>> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
>>>>>> index 14f1be3..3c78498 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_da_btree.h
>>>>>> +++ b/fs/xfs/libxfs/xfs_da_btree.h
>>>>>> @@ -50,9 +50,39 @@ enum xfs_dacmp {
>>>>>>     };
>>>>>>     /*
>>>>>> + * Enum values for xfs_delattr_context.da_state
>>>>>> + *
>>>>>> + * These values are used by delayed attribute operations to keep track  of where
>>>>>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>>>>>> + * calling function to roll the transaction, and then recall the subroutine to
>>>>>> + * finish the operation.  The enum is then used by the subroutine to jump back
>>>>>> + * to where it was and resume executing where it left off.
>>>>>> + */
>>>>>> +enum xfs_delattr_state {
>>>>>> +	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
>>>>>> +	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
>>>>>> +};
>>>>>> +
>>>>>> +/*
>>>>>> + * Defines for xfs_delattr_context.flags
>>>>>> + */
>>>>>> +#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
>>>>>> +
>>>>>> +/*
>>>>>> + * Context used for keeping track of delayed attribute operations
>>>>>> + */
>>>>>> +struct xfs_delattr_context {
>>>>>> +	struct xfs_da_state	*da_state;
>>>>>> +	struct xfs_da_state_blk *blk;
>>>>>> +	unsigned int		flags;
>>>>>> +	enum xfs_delattr_state	dela_state;
>>>>>> +};
>>>>>> +
>>>>>> +/*
>>>>>>      * Structure to ease passing around component names.
>>>>>>      */
>>>>>>     typedef struct xfs_da_args {
>>>>>> +	struct xfs_delattr_context dac; /* context used for delay attr ops */
>>>>>>     	struct xfs_da_geometry *geo;	/* da block geometry */
>>>>>>     	struct xfs_name	name;		/* name, length and argument  flags*/
>>>>>>     	uint8_t		filetype;	/* filetype of inode for directories */
>>>>>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>>>>>> index 1887605..9a649d1 100644
>>>>>> --- a/fs/xfs/scrub/common.c
>>>>>> +++ b/fs/xfs/scrub/common.c
>>>>>> @@ -24,6 +24,8 @@
>>>>>>     #include "xfs_rmap_btree.h"
>>>>>>     #include "xfs_log.h"
>>>>>>     #include "xfs_trans_priv.h"
>>>>>> +#include "xfs_da_format.h"
>>>>>> +#include "xfs_da_btree.h"
>>>>>>     #include "xfs_attr.h"
>>>>>>     #include "xfs_reflink.h"
>>>>>>     #include "scrub/scrub.h"
>>>>>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>>>>>> index 42ac847..d65e6d8 100644
>>>>>> --- a/fs/xfs/xfs_acl.c
>>>>>> +++ b/fs/xfs/xfs_acl.c
>>>>>> @@ -10,6 +10,8 @@
>>>>>>     #include "xfs_trans_resv.h"
>>>>>>     #include "xfs_mount.h"
>>>>>>     #include "xfs_inode.h"
>>>>>> +#include "xfs_da_format.h"
>>>>>> +#include "xfs_da_btree.h"
>>>>>>     #include "xfs_attr.h"
>>>>>>     #include "xfs_trace.h"
>>>>>>     #include "xfs_error.h"
>>>>>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>>>>>> index d37743b..881b9a4 100644
>>>>>> --- a/fs/xfs/xfs_attr_list.c
>>>>>> +++ b/fs/xfs/xfs_attr_list.c
>>>>>> @@ -12,6 +12,7 @@
>>>>>>     #include "xfs_trans_resv.h"
>>>>>>     #include "xfs_mount.h"
>>>>>>     #include "xfs_da_format.h"
>>>>>> +#include "xfs_da_btree.h"
>>>>>>     #include "xfs_inode.h"
>>>>>>     #include "xfs_trans.h"
>>>>>>     #include "xfs_bmap.h"
>>>>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>>>>> index 28c07c9..7c1d9da 100644
>>>>>> --- a/fs/xfs/xfs_ioctl.c
>>>>>> +++ b/fs/xfs/xfs_ioctl.c
>>>>>> @@ -15,6 +15,8 @@
>>>>>>     #include "xfs_iwalk.h"
>>>>>>     #include "xfs_itable.h"
>>>>>>     #include "xfs_error.h"
>>>>>> +#include "xfs_da_format.h"
>>>>>> +#include "xfs_da_btree.h"
>>>>>>     #include "xfs_attr.h"
>>>>>>     #include "xfs_bmap.h"
>>>>>>     #include "xfs_bmap_util.h"
>>>>>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>>>>>> index 769581a..d504f8f 100644
>>>>>> --- a/fs/xfs/xfs_ioctl32.c
>>>>>> +++ b/fs/xfs/xfs_ioctl32.c
>>>>>> @@ -17,6 +17,8 @@
>>>>>>     #include "xfs_itable.h"
>>>>>>     #include "xfs_fsops.h"
>>>>>>     #include "xfs_rtalloc.h"
>>>>>> +#include "xfs_da_format.h"
>>>>>> +#include "xfs_da_btree.h"
>>>>>>     #include "xfs_attr.h"
>>>>>>     #include "xfs_ioctl.h"
>>>>>>     #include "xfs_ioctl32.h"
>>>>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>>>>> index e85bbf5..a2d299f 100644
>>>>>> --- a/fs/xfs/xfs_iops.c
>>>>>> +++ b/fs/xfs/xfs_iops.c
>>>>>> @@ -13,6 +13,8 @@
>>>>>>     #include "xfs_inode.h"
>>>>>>     #include "xfs_acl.h"
>>>>>>     #include "xfs_quota.h"
>>>>>> +#include "xfs_da_format.h"
>>>>>> +#include "xfs_da_btree.h"
>>>>>>     #include "xfs_attr.h"
>>>>>>     #include "xfs_trans.h"
>>>>>>     #include "xfs_trace.h"
>>>>>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>>>>>> index 74133a5..d8dc72d 100644
>>>>>> --- a/fs/xfs/xfs_xattr.c
>>>>>> +++ b/fs/xfs/xfs_xattr.c
>>>>>> @@ -10,6 +10,7 @@
>>>>>>     #include "xfs_log_format.h"
>>>>>>     #include "xfs_da_format.h"
>>>>>>     #include "xfs_inode.h"
>>>>>> +#include "xfs_da_btree.h"
>>>>>>     #include "xfs_attr.h"
>>>>>>     #include "xfs_acl.h"
>>>>>> -- 
>>>>>> 2.7.4
>>>>>>
>>>>>
>>>>
>>>
>>
>
Dave Chinner Feb. 26, 2020, 10:34 p.m. UTC | #11
On Tue, Feb 25, 2020 at 04:57:46PM -0800, Allison Collins wrote:
> On 2/25/20 1:57 AM, Dave Chinner wrote:
> > On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
> > > +out:
> > > +	return error;
> > > +}
> > 
> > Brian commented on the structure of this loop better than I could.
> > 
> > > +
> > > +/*
> > > + * Remove the attribute specified in @args.
> > > + *
> > > + * This function may return -EAGAIN to signal that the transaction needs to be
> > > + * rolled.  Callers should continue calling this function until they receive a
> > > + * return value other than -EAGAIN.
> > > + */
> > > +int
> > > +xfs_attr_remove_iter(
> > >   	struct xfs_da_args      *args)
> > >   {
> > >   	struct xfs_inode	*dp = args->dp;
> > >   	int			error;
> > > +	/* State machine switch */
> > > +	switch (args->dac.dela_state) {
> > > +	case XFS_DAS_RM_SHRINK:
> > > +	case XFS_DAS_RMTVAL_REMOVE:
> > > +		goto node;
> > > +	default:
> > > +		break;
> > > +	}
> > 
> > Why separate out the state machine? Doesn't this shortcut the
> > xfs_inode_hasattr() check? Shouldn't that come first?
> Well, the idea is that when we first start the routine, we come in with
> neither state set, and we fall through to the break.  So we execute the
> check the first time through.
> 
> Though now that you point it out, I should probably go back and put the
> explicit numbering back in the enum (starting with 1) or they will default
> to zero, which would be incorrect.  I had pulled it out in one of the last
> reviews thinking it would be ok, but it should go back in.
> 
> > 
> > As it is:
> > 
> > 	case XFS_DAS_RM_SHRINK:
> > 	case XFS_DAS_RMTVAL_REMOVE:
> > 		return xfs_attr_node_removename(args);
> > 	default:
> > 		break;
> > 
> > would be nicer, and if this is the only way we can get to
> > xfs_attr_node_removename(c, getting rid of it from the code
> > below could be done, too.
> Well, the remove path is a lot simpler than the set path, so that trick does
> work here :-)
> 
> The idea though was to establish "jump points" with the "XFS_DAS_*" states.
> Based on the state, we jump back to where we were.  We could break this
> pattern for the remove path, but I dont think we'd want to do the same for
> the others.  The set routine is a really big function that would end up
> being inside a really big switch!

Right, which is why I think it should be factored into function
calls first, then the switch statement simply becomes a small set of
function calls.

We use that pattern quite a bit in the da_btree code to call
the correct dir/attr function based on the type of block we are
manipulating (i.e. based on da_state context). e.g. xfs_da3_split(),
xfs_da3_join(), etc.

> > >   	struct xfs_da_geometry *geo;	/* da block geometry */
> > >   	struct xfs_name	name;		/* name, length and argument  flags*/
> > >   	uint8_t		filetype;	/* filetype of inode for directories */
> > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > index 1887605..9a649d1 100644
> > > --- a/fs/xfs/scrub/common.c
> > > +++ b/fs/xfs/scrub/common.c
> > > @@ -24,6 +24,8 @@
> > >   #include "xfs_rmap_btree.h"
> > >   #include "xfs_log.h"
> > >   #include "xfs_trans_priv.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_reflink.h"
> > >   #include "scrub/scrub.h"
> > 
> > Hmmm - why are these new includes necessary? You didn't add anything
> > new to these files or common header files to make the includes
> > needed....
> 
> Because the delayed attr context uses things from those headers.  And we put
> the context in xfs_da_args.  Now everything that uses xfs_da_args needs
> those includes.  But maybe if we do what you suggest above, we wont need to.
> :-)

put:

struct xfs_da_state;

and whatever other forward declarations are require for the pointer
types used in the delayed attr context at the top of xfs_attr.h.

These are just pointers in the structure, so we don't need the full
structure definitions if the pointers aren't actually dereferenced
by the code that includes the header file.

Cheers,

Dave.
Allison Henderson Feb. 27, 2020, 4:18 a.m. UTC | #12
On 2/26/20 3:34 PM, Dave Chinner wrote:
> On Tue, Feb 25, 2020 at 04:57:46PM -0800, Allison Collins wrote:
>> On 2/25/20 1:57 AM, Dave Chinner wrote:
>>> On Sat, Feb 22, 2020 at 07:06:05PM -0700, Allison Collins wrote:
>>>> +out:
>>>> +	return error;
>>>> +}
>>>
>>> Brian commented on the structure of this loop better than I could.
>>>
>>>> +
>>>> +/*
>>>> + * Remove the attribute specified in @args.
>>>> + *
>>>> + * This function may return -EAGAIN to signal that the transaction needs to be
>>>> + * rolled.  Callers should continue calling this function until they receive a
>>>> + * return value other than -EAGAIN.
>>>> + */
>>>> +int
>>>> +xfs_attr_remove_iter(
>>>>    	struct xfs_da_args      *args)
>>>>    {
>>>>    	struct xfs_inode	*dp = args->dp;
>>>>    	int			error;
>>>> +	/* State machine switch */
>>>> +	switch (args->dac.dela_state) {
>>>> +	case XFS_DAS_RM_SHRINK:
>>>> +	case XFS_DAS_RMTVAL_REMOVE:
>>>> +		goto node;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>
>>> Why separate out the state machine? Doesn't this shortcut the
>>> xfs_inode_hasattr() check? Shouldn't that come first?
>> Well, the idea is that when we first start the routine, we come in with
>> neither state set, and we fall through to the break.  So we execute the
>> check the first time through.
>>
>> Though now that you point it out, I should probably go back and put the
>> explicit numbering back in the enum (starting with 1) or they will default
>> to zero, which would be incorrect.  I had pulled it out in one of the last
>> reviews thinking it would be ok, but it should go back in.
>>
>>>
>>> As it is:
>>>
>>> 	case XFS_DAS_RM_SHRINK:
>>> 	case XFS_DAS_RMTVAL_REMOVE:
>>> 		return xfs_attr_node_removename(args);
>>> 	default:
>>> 		break;
>>>
>>> would be nicer, and if this is the only way we can get to
>>> xfs_attr_node_removename(c, getting rid of it from the code
>>> below could be done, too.
>> Well, the remove path is a lot simpler than the set path, so that trick does
>> work here :-)
>>
>> The idea though was to establish "jump points" with the "XFS_DAS_*" states.
>> Based on the state, we jump back to where we were.  We could break this
>> pattern for the remove path, but I dont think we'd want to do the same for
>> the others.  The set routine is a really big function that would end up
>> being inside a really big switch!
> 
> Right, which is why I think it should be factored into function
> calls first, then the switch statement simply becomes a small set of
> function calls.
> 
> We use that pattern quite a bit in the da_btree code to call
> the correct dir/attr function based on the type of block we are
> manipulating (i.e. based on da_state context). e.g. xfs_da3_split(),
> xfs_da3_join(), etc.
I see, sure will do.  The patches were ordered much that way in the last 
version, so it wouldnt be hard to undo.

> 
>>>>    	struct xfs_da_geometry *geo;	/* da block geometry */
>>>>    	struct xfs_name	name;		/* name, length and argument  flags*/
>>>>    	uint8_t		filetype;	/* filetype of inode for directories */
>>>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>>>> index 1887605..9a649d1 100644
>>>> --- a/fs/xfs/scrub/common.c
>>>> +++ b/fs/xfs/scrub/common.c
>>>> @@ -24,6 +24,8 @@
>>>>    #include "xfs_rmap_btree.h"
>>>>    #include "xfs_log.h"
>>>>    #include "xfs_trans_priv.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_reflink.h"
>>>>    #include "scrub/scrub.h"
>>>
>>> Hmmm - why are these new includes necessary? You didn't add anything
>>> new to these files or common header files to make the includes
>>> needed....
>>
>> Because the delayed attr context uses things from those headers.  And we put
>> the context in xfs_da_args.  Now everything that uses xfs_da_args needs
>> those includes.  But maybe if we do what you suggest above, we wont need to.
>> :-)
> 
> put:
> 
> struct xfs_da_state;
> 
> and whatever other forward declarations are require for the pointer
> types used in the delayed attr context at the top of xfs_attr.h.
> 
> These are just pointers in the structure, so we don't need the full
> structure definitions if the pointers aren't actually dereferenced
> by the code that includes the header file.
Alrighty, will fix.

Thanks for the reviews!
Allison

> 
> Cheers,
> 
> Dave.
>
Chandan Rajendra March 3, 2020, 5:03 a.m. UTC | #13
On Sunday, February 23, 2020 7:36 AM Allison Collins wrote: 
> This patch modifies the attr remove routines to be delay ready. This means they no
> longer roll or commit transactions, but instead return -EAGAIN to have the calling
> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
> been modified to use the switch, and a  new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation is
> completed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use to keep
> track of the current state of an attribute operation. The new xfs_delattr_state
> enum is used to track various operations that are in progress so that we know not
> to repeat them, and resume where we left off before EAGAIN was returned to cycle
> out the transaction. Other members take the place of local variables that need
> to retain their values across multiple function recalls.
> 
> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
> indicate places where the function would return -EAGAIN, and then immediately
> resume from after being recalled by the calling function.  States marked as a
> "subroutine state" indicate that they belong to a subroutine, and so the calling
> function needs to pass them back to that subroutine to allow it to finish where
> it left off. But they otherwise do not have a role in the calling function other
> than just passing through.
> 
>  xfs_attr_remove_iter()
>          XFS_DAS_RM_SHRINK     ─┐
>          (subroutine state)     │
>                                 │
>          XFS_DAS_RMTVAL_REMOVE ─┤
>          (subroutine state)     │
>                                 └─>xfs_attr_node_removename()
>                                                  │
>                                                  v
>                                          need to remove
>                                    ┌─n──  rmt blocks?
>                                    │             │
>                                    │             y
>                                    │             │
>                                    │             v
>                                    │  ┌─>XFS_DAS_RMTVAL_REMOVE
>                                    │  │          │
>                                    │  │          v
>                                    │  └──y── more blks
>                                    │         to remove?
>                                    │             │
>                                    │             n
>                                    │             │
>                                    │             v
>                                    │         need to
>                                    └─────> shrink tree? ─n─┐
>                                                  │         │
>                                                  y         │
>                                                  │         │
>                                                  v         │
>                                          XFS_DAS_RM_SHRINK │
>                                                  │         │
>                                                  v         │
>                                                 done <─────┘
> 
> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>  fs/xfs/libxfs/xfs_attr.h     |   1 +
>  fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>  fs/xfs/scrub/common.c        |   2 +
>  fs/xfs/xfs_acl.c             |   2 +
>  fs/xfs/xfs_attr_list.c       |   1 +
>  fs/xfs/xfs_ioctl.c           |   2 +
>  fs/xfs/xfs_ioctl32.c         |   2 +
>  fs/xfs/xfs_iops.c            |   2 +
>  fs/xfs/xfs_xattr.c           |   1 +
>  10 files changed, 141 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 5d73bdf..cd3a3f7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -368,11 +368,60 @@ xfs_has_attr(
>   */
>  int
>  xfs_attr_remove_args(
> +	struct xfs_da_args	*args)
> +{
> +	int			error = 0;
> +	int			err2 = 0;
> +
> +	do {
> +		error = xfs_attr_remove_iter(args);
> +		if (error && error != -EAGAIN)
> +			goto out;
> +
> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
> +
> +			err2 = xfs_defer_finish(&args->trans);
> +			if (err2) {
> +				error = err2;
> +				goto out;
> +			}
> +		}
> +
> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
> +		if (err2) {
> +			error = err2;
> +			goto out;
> +		}
> +
> +	} while (error == -EAGAIN);
> +out:
> +	return error;
> +}
> +
> +/*
> + * Remove the attribute specified in @args.
> + *
> + * This function may return -EAGAIN to signal that the transaction needs to be
> + * rolled.  Callers should continue calling this function until they receive a
> + * return value other than -EAGAIN.
> + */
> +int
> +xfs_attr_remove_iter(
>  	struct xfs_da_args      *args)
>  {
>  	struct xfs_inode	*dp = args->dp;
>  	int			error;
>  
> +	/* State machine switch */
> +	switch (args->dac.dela_state) {
> +	case XFS_DAS_RM_SHRINK:
> +	case XFS_DAS_RMTVAL_REMOVE:
> +		goto node;
> +	default:
> +		break;
> +	}
> +

On the very first invocation of xfs_attr_remote_iter() from
xfs_attr_remove_args() (via a call from xfs_attr_remove()),
args->dac.dela_state is set to a value of 0. This happens because
xfs_attr_args_init() invokes memset() on args. A value of 0 for
args->dac.dela_state maps to XFS_DAS_RM_SHRINK.

If the xattr was stored in say local or leaf format we end up incorrectly
invoking xfs_attr_node_removename() right?
Allison Henderson March 3, 2020, 5:40 a.m. UTC | #14
On 3/2/20 10:03 PM, Chandan Rajendra wrote:
> On Sunday, February 23, 2020 7:36 AM Allison Collins wrote:
>> This patch modifies the attr remove routines to be delay ready. This means they no
>> longer roll or commit transactions, but instead return -EAGAIN to have the calling
>> routine roll and refresh the transaction. In this series, xfs_attr_remove_args has
>> become xfs_attr_remove_iter, which uses a sort of state machine like switch to keep
>> track of where it was when EAGAIN was returned. xfs_attr_node_removename has also
>> been modified to use the switch, and a  new version of xfs_attr_remove_args
>> consists of a simple loop to refresh the transaction until the operation is
>> completed.
>>
>> This patch also adds a new struct xfs_delattr_context, which we will use to keep
>> track of the current state of an attribute operation. The new xfs_delattr_state
>> enum is used to track various operations that are in progress so that we know not
>> to repeat them, and resume where we left off before EAGAIN was returned to cycle
>> out the transaction. Other members take the place of local variables that need
>> to retain their values across multiple function recalls.
>>
>> Below is a state machine diagram for attr remove operations. The XFS_DAS_* states
>> indicate places where the function would return -EAGAIN, and then immediately
>> resume from after being recalled by the calling function.  States marked as a
>> "subroutine state" indicate that they belong to a subroutine, and so the calling
>> function needs to pass them back to that subroutine to allow it to finish where
>> it left off. But they otherwise do not have a role in the calling function other
>> than just passing through.
>>
>>   xfs_attr_remove_iter()
>>           XFS_DAS_RM_SHRINK     ─┐
>>           (subroutine state)     │
>>                                  │
>>           XFS_DAS_RMTVAL_REMOVE ─┤
>>           (subroutine state)     │
>>                                  └─>xfs_attr_node_removename()
>>                                                   │
>>                                                   v
>>                                           need to remove
>>                                     ┌─n──  rmt blocks?
>>                                     │             │
>>                                     │             y
>>                                     │             │
>>                                     │             v
>>                                     │  ┌─>XFS_DAS_RMTVAL_REMOVE
>>                                     │  │          │
>>                                     │  │          v
>>                                     │  └──y── more blks
>>                                     │         to remove?
>>                                     │             │
>>                                     │             n
>>                                     │             │
>>                                     │             v
>>                                     │         need to
>>                                     └─────> shrink tree? ─n─┐
>>                                                   │         │
>>                                                   y         │
>>                                                   │         │
>>                                                   v         │
>>                                           XFS_DAS_RM_SHRINK │
>>                                                   │         │
>>                                                   v         │
>>                                                  done <─────┘
>>
>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c     | 114 +++++++++++++++++++++++++++++++++++++------
>>   fs/xfs/libxfs/xfs_attr.h     |   1 +
>>   fs/xfs/libxfs/xfs_da_btree.h |  30 ++++++++++++
>>   fs/xfs/scrub/common.c        |   2 +
>>   fs/xfs/xfs_acl.c             |   2 +
>>   fs/xfs/xfs_attr_list.c       |   1 +
>>   fs/xfs/xfs_ioctl.c           |   2 +
>>   fs/xfs/xfs_ioctl32.c         |   2 +
>>   fs/xfs/xfs_iops.c            |   2 +
>>   fs/xfs/xfs_xattr.c           |   1 +
>>   10 files changed, 141 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 5d73bdf..cd3a3f7 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -368,11 +368,60 @@ xfs_has_attr(
>>    */
>>   int
>>   xfs_attr_remove_args(
>> +	struct xfs_da_args	*args)
>> +{
>> +	int			error = 0;
>> +	int			err2 = 0;
>> +
>> +	do {
>> +		error = xfs_attr_remove_iter(args);
>> +		if (error && error != -EAGAIN)
>> +			goto out;
>> +
>> +		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
>> +			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
>> +
>> +			err2 = xfs_defer_finish(&args->trans);
>> +			if (err2) {
>> +				error = err2;
>> +				goto out;
>> +			}
>> +		}
>> +
>> +		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		if (err2) {
>> +			error = err2;
>> +			goto out;
>> +		}
>> +
>> +	} while (error == -EAGAIN);
>> +out:
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove the attribute specified in @args.
>> + *
>> + * This function may return -EAGAIN to signal that the transaction needs to be
>> + * rolled.  Callers should continue calling this function until they receive a
>> + * return value other than -EAGAIN.
>> + */
>> +int
>> +xfs_attr_remove_iter(
>>   	struct xfs_da_args      *args)
>>   {
>>   	struct xfs_inode	*dp = args->dp;
>>   	int			error;
>>   
>> +	/* State machine switch */
>> +	switch (args->dac.dela_state) {
>> +	case XFS_DAS_RM_SHRINK:
>> +	case XFS_DAS_RMTVAL_REMOVE:
>> +		goto node;
>> +	default:
>> +		break;
>> +	}
>> +
> 
> On the very first invocation of xfs_attr_remote_iter() from
> xfs_attr_remove_args() (via a call from xfs_attr_remove()),
> args->dac.dela_state is set to a value of 0. This happens because
> xfs_attr_args_init() invokes memset() on args. A value of 0 for
> args->dac.dela_state maps to XFS_DAS_RM_SHRINK.
> 
> If the xattr was stored in say local or leaf format we end up incorrectly
> invoking xfs_attr_node_removename() right?
> 
Hi Chandan,

Yes, this came up in one of the other reviews too.  The indexing for the 
XFS_DAS_* enum should start at 1, not zero.  I had pulled it out of the 
last version thinking it would be ok, but I should have kept the 
indexing starting at 1, allowing the switch to fall through to default.

Allison
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 5d73bdf..cd3a3f7 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -368,11 +368,60 @@  xfs_has_attr(
  */
 int
 xfs_attr_remove_args(
+	struct xfs_da_args	*args)
+{
+	int			error = 0;
+	int			err2 = 0;
+
+	do {
+		error = xfs_attr_remove_iter(args);
+		if (error && error != -EAGAIN)
+			goto out;
+
+		if (args->dac.flags & XFS_DAC_FINISH_TRANS) {
+			args->dac.flags &= ~XFS_DAC_FINISH_TRANS;
+
+			err2 = xfs_defer_finish(&args->trans);
+			if (err2) {
+				error = err2;
+				goto out;
+			}
+		}
+
+		err2 = xfs_trans_roll_inode(&args->trans, args->dp);
+		if (err2) {
+			error = err2;
+			goto out;
+		}
+
+	} while (error == -EAGAIN);
+out:
+	return error;
+}
+
+/*
+ * Remove the attribute specified in @args.
+ *
+ * This function may return -EAGAIN to signal that the transaction needs to be
+ * rolled.  Callers should continue calling this function until they receive a
+ * return value other than -EAGAIN.
+ */
+int
+xfs_attr_remove_iter(
 	struct xfs_da_args      *args)
 {
 	struct xfs_inode	*dp = args->dp;
 	int			error;
 
+	/* State machine switch */
+	switch (args->dac.dela_state) {
+	case XFS_DAS_RM_SHRINK:
+	case XFS_DAS_RMTVAL_REMOVE:
+		goto node;
+	default:
+		break;
+	}
+
 	if (!xfs_inode_hasattr(dp)) {
 		error = -ENOATTR;
 	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
@@ -381,6 +430,7 @@  xfs_attr_remove_args(
 	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
 		error = xfs_attr_leaf_removename(args);
 	} else {
+node:
 		error = xfs_attr_node_removename(args);
 	}
 
@@ -895,9 +945,8 @@  xfs_attr_leaf_removename(
 		/* bp is gone due to xfs_da_shrink_inode */
 		if (error)
 			return error;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			return error;
+
+		args->dac.flags |= XFS_DAC_FINISH_TRANS;
 	}
 	return 0;
 }
@@ -1218,6 +1267,11 @@  xfs_attr_node_addname(
  * This will involve walking down the Btree, and may involve joining
  * leaf nodes and even joining intermediate nodes up to and including
  * the root node (a special case of an intermediate node).
+ *
+ * This routine is meant to function as either an inline or delayed operation,
+ * and may return -EAGAIN when the transaction needs to be rolled.  Calling
+ * functions will need to handle this, and recall the function until a
+ * successful error code is returned.
  */
 STATIC int
 xfs_attr_node_removename(
@@ -1230,10 +1284,24 @@  xfs_attr_node_removename(
 	struct xfs_inode	*dp = args->dp;
 
 	trace_xfs_attr_node_removename(args);
+	state = args->dac.da_state;
+	blk = args->dac.blk;
+
+	/* State machine switch */
+	switch (args->dac.dela_state) {
+	case XFS_DAS_RMTVAL_REMOVE:
+		goto rm_node_blks;
+	case XFS_DAS_RM_SHRINK:
+		goto rm_shrink;
+	default:
+		break;
+	}
 
 	error = xfs_attr_node_hasname(args, &state);
 	if (error != -EEXIST)
 		goto out;
+	else
+		error = 0;
 
 	/*
 	 * If there is an out-of-line value, de-allocate the blocks.
@@ -1243,6 +1311,14 @@  xfs_attr_node_removename(
 	blk = &state->path.blk[ state->path.active-1 ];
 	ASSERT(blk->bp != NULL);
 	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
+
+	/*
+	 * Store blk and state in the context incase we need to cycle out the
+	 * transaction
+	 */
+	args->dac.blk = blk;
+	args->dac.da_state = state;
+
 	if (args->rmtblkno > 0) {
 		/*
 		 * Fill in disk block numbers in the state structure
@@ -1261,13 +1337,21 @@  xfs_attr_node_removename(
 		if (error)
 			goto out;
 
-		error = xfs_trans_roll_inode(&args->trans, args->dp);
+		error = xfs_attr_rmtval_invalidate(args);
 		if (error)
 			goto out;
+	}
 
-		error = xfs_attr_rmtval_remove(args);
-		if (error)
-			goto out;
+rm_node_blks:
+
+	if (args->rmtblkno > 0) {
+		error = xfs_attr_rmtval_unmap(args);
+
+		if (error) {
+			if (error == -EAGAIN)
+				args->dac.dela_state = XFS_DAS_RMTVAL_REMOVE;
+			return error;
+		}
 
 		/*
 		 * Refill the state structure with buffers, the prior calls
@@ -1293,17 +1377,15 @@  xfs_attr_node_removename(
 		error = xfs_da3_join(state);
 		if (error)
 			goto out;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			goto out;
-		/*
-		 * Commit the Btree join operation and start a new trans.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, dp);
-		if (error)
-			goto out;
+
+		args->dac.flags |= XFS_DAC_FINISH_TRANS;
+		args->dac.dela_state = XFS_DAS_RM_SHRINK;
+		return -EAGAIN;
 	}
 
+rm_shrink:
+	args->dac.dela_state = XFS_DAS_RM_SHRINK;
+
 	/*
 	 * If the result is small enough, push it all into the inode.
 	 */
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index ce7b039..ea873a5 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -155,6 +155,7 @@  int xfs_attr_set_args(struct xfs_da_args *args);
 int xfs_attr_remove(struct xfs_inode *dp, struct xfs_name *name, int flags);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
+int xfs_attr_remove_iter(struct xfs_da_args *args);
 int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
 		  int flags, struct attrlist_cursor_kern *cursor);
 bool xfs_attr_namecheck(const void *name, size_t length);
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 14f1be3..3c78498 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -50,9 +50,39 @@  enum xfs_dacmp {
 };
 
 /*
+ * Enum values for xfs_delattr_context.da_state
+ *
+ * These values are used by delayed attribute operations to keep track  of where
+ * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
+ * calling function to roll the transaction, and then recall the subroutine to
+ * finish the operation.  The enum is then used by the subroutine to jump back
+ * to where it was and resume executing where it left off.
+ */
+enum xfs_delattr_state {
+	XFS_DAS_RM_SHRINK,	/* We are shrinking the tree */
+	XFS_DAS_RMTVAL_REMOVE,	/* We are removing remote value blocks */
+};
+
+/*
+ * Defines for xfs_delattr_context.flags
+ */
+#define	XFS_DAC_FINISH_TRANS	0x1 /* indicates to finish the transaction */
+
+/*
+ * Context used for keeping track of delayed attribute operations
+ */
+struct xfs_delattr_context {
+	struct xfs_da_state	*da_state;
+	struct xfs_da_state_blk *blk;
+	unsigned int		flags;
+	enum xfs_delattr_state	dela_state;
+};
+
+/*
  * Structure to ease passing around component names.
  */
 typedef struct xfs_da_args {
+	struct xfs_delattr_context dac; /* context used for delay attr ops */
 	struct xfs_da_geometry *geo;	/* da block geometry */
 	struct xfs_name	name;		/* name, length and argument  flags*/
 	uint8_t		filetype;	/* filetype of inode for directories */
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 1887605..9a649d1 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -24,6 +24,8 @@ 
 #include "xfs_rmap_btree.h"
 #include "xfs_log.h"
 #include "xfs_trans_priv.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_reflink.h"
 #include "scrub/scrub.h"
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 42ac847..d65e6d8 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -10,6 +10,8 @@ 
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trace.h"
 #include "xfs_error.h"
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index d37743b..881b9a4 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -12,6 +12,7 @@ 
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 28c07c9..7c1d9da 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -15,6 +15,8 @@ 
 #include "xfs_iwalk.h"
 #include "xfs_itable.h"
 #include "xfs_error.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 769581a..d504f8f 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -17,6 +17,8 @@ 
 #include "xfs_itable.h"
 #include "xfs_fsops.h"
 #include "xfs_rtalloc.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_ioctl.h"
 #include "xfs_ioctl32.h"
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index e85bbf5..a2d299f 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -13,6 +13,8 @@ 
 #include "xfs_inode.h"
 #include "xfs_acl.h"
 #include "xfs_quota.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trans.h"
 #include "xfs_trace.h"
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 74133a5..d8dc72d 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -10,6 +10,7 @@ 
 #include "xfs_log_format.h"
 #include "xfs_da_format.h"
 #include "xfs_inode.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_acl.h"