[v7,00/19] xfs: Delayed Ready Attrs
mbox series

Message ID 20200223020611.1802-1-allison.henderson@oracle.com
Headers show
Series
  • xfs: Delayed Ready Attrs
Related show

Message

Allison Collins Feb. 23, 2020, 2:05 a.m. UTC
Hi all,

This set is a subset of a larger series for delayed attributes. Which is
a subset of an even larger series, parent pointers. Delayed attributes
allow attribute operations (set and remove) to be logged and committed
in the same way that other delayed operations do. This allows more
complex operations (like parent pointers) to be broken up into multiple
smaller transactions. To do this, the existing attr operations must be
modified to operate as either a delayed operation or a inline operation
since older filesystems will not be able to use the new log entries.
This means that they cannot roll, commit, or finish transactions.
Instead, they return  EAGAIN an allow the calling function to handle
the transaction. In this series, we focus on only the clean up and
refactoring needed to accomplish this. We will introduce delayed attrs
and parent pointers in a later set.

At the moment, I would like people to focus their review efforts on just
this "delay ready" subseries, as I think that is a more conservative use
of peoples review time.  I also think the set is a bit much to manage
all at once, and we need to get the infrastructure ironed out before we
focus too much anything that depends on it. But I do have the extended
series for folks that want to see the bigger picture of where this is
going.

This series can be viewed on github here:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v7

As well as the extended delayed attribute and parent pointer series:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v7_extended

And the test cases:
https://github.com/allisonhenderson/xfs_work/tree/pptr_xfstests

To run the xfs attributes tests run:
check -g attr

To run as delayed attributes run:
export MKFS_OPTIONS="-n delattr"
check -g attr

To run parent pointer tests:
check -g parent

Changes since v6:
Mostly review just updates.  In the last review folks asked for a
diagram to make the code flow more visual.  They have been added in
patches 13 and 14.  At the moment, they're in the commit message.  I
figure people can decide where or if they want them somewhere in the
code later, since they are a bit lengthy.  Patch 15 is a helper function
that used to be part of 14 but I separated it to better organize things.
Patches 18 and 19 are new, and are just cleanups that I thought would
be helpful.

I've also gone through the set and rearranged the order of some of the
patches to make sort of mini sets.  I thought it would help reviewers
break down the reviewing some. For reviewing purposes, the set could be
broken up into 5 different phases:

Clean ups (patches 1-2):
These two patches were actually review suggestions made quiet some time
ago in regaurds to commit d29f781 (Remove all strlen ...), which has
since been merged.  The idea of the cleanups was to collapse down the
number of parameters getting passed around the xfs_attr_* routines.  It
has since drawn both commentary and critisim, so it's a little unclear
to me if people want it or not.  It could easily be taken as a stand
alone clean up.  But it's also not a neccassary must have in so far as
the delayed operations are concerned.  It does however pepper around a
lot of change activity up through the set.  So it would be helpfull to
get a verdict one way or another, depending on how people feel about it.
   xfs: Replace attribute parameters with struct xfs_name
   xfs: Embed struct xfs_name in xfs_da_args

Error code filtering (patches 3-4):
These two patches are all about finding and catching error codes that
need to be sent back up to user space before starting delayed
operations.  Errors that happen during a delayed operation are treated
like interal errors that cause a shutdown.  But we wouldnt want that
for example: when the user tries to rename a non existent attr.  So the
idea is that we need to find all such conditions, and take care of them
before starting a delayed operation.
   xfs: Add xfs_has_attr and subroutines
   xfs: Check for -ENOATTR or -EEXIST

Move transactions upwards (patches 5-12):
The goal of this subset is to try and move all the transaction specific
code up the call stack much as possible.  The idea being that once we
get them to the top, we can introduce the statemachine to handle the
-EAGAIN logic where ever the transactions used to be.
   xfs: Factor out new helper functions xfs_attr_rmtval_set
   xfs: Factor out trans handling in xfs_attr3_leaf_flipflags
   xfs: Factor out xfs_attr_leaf_addname helper
   xfs: Refactor xfs_attr_try_sf_addname
   xfs: Factor out trans roll from xfs_attr3_leaf_setflag
   xfs: Factor out xfs_attr_rmtval_invalidate
   xfs: Factor out trans roll in xfs_attr3_leaf_clearflag
   xfs: Add helper function xfs_attr_rmtval_unmap

Introduce statemachine (patches 13-14):
Now that we have re-arranged the code such that we can remove the
transaction handleing, we proceed to do so.  The behavior of the attr
set/remove routines are now also compatible as a .finish_item callback
   xfs: Add delay ready attr remove routines
   xfs: Add delay ready attr set routines

Modularizing and cleanups (patches 15-19):
Now that we have pulled the transactions up to where we need them, it's
time to start breaking down the top level functions into new
subfunctions. The goal being to work towards a top level function that
deals mostly with the statemachine, and helpers for those states
   xfs: Add helper function xfs_attr_node_shrink
   xfs: Simplify xfs_attr_set_iter
   xfs: Add helper function xfs_attr_leaf_mark_incomplete
   xfs: Add remote block helper functions
   xfs: Remove xfs_attr_rmtval_remove


I've also made the corresponding updates to the user space side as well.

Questions, comment and feedback appreciated! 

Thanks all!
Allison

Allison Collins (18):
  xfs: Replace attribute parameters with struct xfs_name
  xfs: Embed struct xfs_name in xfs_da_args
  xfs: Check for -ENOATTR or -EEXIST
  xfs: Factor out new helper functions xfs_attr_rmtval_set
  xfs: Factor out trans handling in xfs_attr3_leaf_flipflags
  xfs: Factor out xfs_attr_leaf_addname helper
  xfs: Refactor xfs_attr_try_sf_addname
  xfs: Factor out trans roll from xfs_attr3_leaf_setflag
  xfs: Factor out xfs_attr_rmtval_invalidate
  xfs: Factor out trans roll in xfs_attr3_leaf_clearflag
  xfs: Add helper function xfs_attr_rmtval_unmap
  xfs: Add delay ready attr remove routines
  xfs: Add delay ready attr set routines
  xfs: Add helper function xfs_attr_node_shrink
  xfs: Simplify xfs_attr_set_iter
  xfs: Add helper function xfs_attr_leaf_mark_incomplete
  xfs: Add remote block helper functions
  xfs: Remove xfs_attr_rmtval_remove

Allison Henderson (1):
  xfs: Add xfs_has_attr and subroutines

 fs/xfs/libxfs/xfs_attr.c        | 947 ++++++++++++++++++++++++++++------------
 fs/xfs/libxfs/xfs_attr.h        |  15 +-
 fs/xfs/libxfs/xfs_attr_leaf.c   | 220 +++++-----
 fs/xfs/libxfs/xfs_attr_leaf.h   |   3 +
 fs/xfs/libxfs/xfs_attr_remote.c | 260 +++++++----
 fs/xfs/libxfs/xfs_attr_remote.h |   7 +-
 fs/xfs/libxfs/xfs_da_btree.c    |   6 +-
 fs/xfs/libxfs/xfs_da_btree.h    |  47 +-
 fs/xfs/libxfs/xfs_dir2.c        |  18 +-
 fs/xfs/libxfs/xfs_dir2_block.c  |   6 +-
 fs/xfs/libxfs/xfs_dir2_leaf.c   |   6 +-
 fs/xfs/libxfs/xfs_dir2_node.c   |   8 +-
 fs/xfs/libxfs/xfs_dir2_sf.c     |  30 +-
 fs/xfs/libxfs/xfs_types.c       |  11 +
 fs/xfs/libxfs/xfs_types.h       |   1 +
 fs/xfs/scrub/attr.c             |  12 +-
 fs/xfs/scrub/common.c           |   2 +
 fs/xfs/xfs_acl.c                |  27 +-
 fs/xfs/xfs_attr_list.c          |   1 +
 fs/xfs/xfs_ioctl.c              |  25 +-
 fs/xfs/xfs_ioctl32.c            |   2 +
 fs/xfs/xfs_iops.c               |   8 +-
 fs/xfs/xfs_trace.h              |  21 +-
 fs/xfs/xfs_xattr.c              |  29 +-
 24 files changed, 1140 insertions(+), 572 deletions(-)

Comments

Amir Goldstein Feb. 23, 2020, 7:55 a.m. UTC | #1
On Sun, Feb 23, 2020 at 4:06 AM Allison Collins
<allison.henderson@oracle.com> wrote:
>
> Hi all,
>
> This set is a subset of a larger series for delayed attributes. Which is
> a subset of an even larger series, parent pointers. Delayed attributes
> allow attribute operations (set and remove) to be logged and committed
> in the same way that other delayed operations do. This allows more
> complex operations (like parent pointers) to be broken up into multiple
> smaller transactions. To do this, the existing attr operations must be
> modified to operate as either a delayed operation or a inline operation
> since older filesystems will not be able to use the new log entries.

High level question, before I dive into the series:

Which other "delayed operations" already exist?
I think delayed operations were added by Darrick to handle the growth of
translation size due to reflink. Right? So I assume the existing delayed
operations deal with block accounting.
When speaking of parent pointers, without having looked into the details yet,
it seem the delayed operations we would want to log are operations that deal
with namespace changes, i.e.: link,unlink,rename.
The information needed to be logged for these ops is minimal.
Why do we need a general infrastructure for delayed attr operations?

Thanks,
Amir.
Allison Collins Feb. 23, 2020, 4:02 p.m. UTC | #2
On 2/23/20 12:55 AM, Amir Goldstein wrote:
> On Sun, Feb 23, 2020 at 4:06 AM Allison Collins
> <allison.henderson@oracle.com> wrote:
>>
>> Hi all,
>>
>> This set is a subset of a larger series for delayed attributes. Which is
>> a subset of an even larger series, parent pointers. Delayed attributes
>> allow attribute operations (set and remove) to be logged and committed
>> in the same way that other delayed operations do. This allows more
>> complex operations (like parent pointers) to be broken up into multiple
>> smaller transactions. To do this, the existing attr operations must be
>> modified to operate as either a delayed operation or a inline operation
>> since older filesystems will not be able to use the new log entries.
> 
> High level question, before I dive into the series:
> 
> Which other "delayed operations" already exist?
> I think delayed operations were added by Darrick to handle the growth of
> translation size due to reflink. Right? So I assume the existing delayed
> operations deal with block accounting.
Gosh, quite a few I think, but I'm not solid on what they all do.  If we 
take a peek at XFS_LI_TYPE_DESC, theres an identifier for each type, to 
give you an idea.  A lot of them do look like they are part of reflink 
operations though.

> When speaking of parent pointers, without having looked into the details yet,
> it seem the delayed operations we would want to log are operations that deal
> with namespace changes, i.e.: link,unlink,rename.
> The information needed to be logged for these ops is minimal.
> Why do we need a general infrastructure for delayed attr operations?
> 
> Thanks,
> Amir.
> 
Great question, this one goes back a ways.  I believe the train of logic 
we had is that because parent pointers also include the filename of the 
parent, its possible we can end up with really big attributes.  Which 
may run into a lot of block map/unmap activity for name space changes. 
We didnt want to end up with overly large transactions in the log, so we 
wanted to break them up by returning -EAGAIN where ever the transactions 
used to be rolled.  I'm pretty sure that covers a quick high level 
history of where we are now?  Did that answer your question?

Allison
Amir Goldstein Feb. 24, 2020, 6:30 a.m. UTC | #3
On Sun, Feb 23, 2020 at 6:02 PM Allison Collins
<allison.henderson@oracle.com> wrote:
>
>
>
> On 2/23/20 12:55 AM, Amir Goldstein wrote:
> > On Sun, Feb 23, 2020 at 4:06 AM Allison Collins
> > <allison.henderson@oracle.com> wrote:
> >>
> >> Hi all,
> >>
> >> This set is a subset of a larger series for delayed attributes. Which is
> >> a subset of an even larger series, parent pointers. Delayed attributes
> >> allow attribute operations (set and remove) to be logged and committed
> >> in the same way that other delayed operations do. This allows more
> >> complex operations (like parent pointers) to be broken up into multiple
> >> smaller transactions. To do this, the existing attr operations must be
> >> modified to operate as either a delayed operation or a inline operation
> >> since older filesystems will not be able to use the new log entries.
> >
> > High level question, before I dive into the series:
> >
> > Which other "delayed operations" already exist?
> > I think delayed operations were added by Darrick to handle the growth of
> > translation size due to reflink. Right? So I assume the existing delayed
> > operations deal with block accounting.
> Gosh, quite a few I think, but I'm not solid on what they all do.  If we
> take a peek at XFS_LI_TYPE_DESC, theres an identifier for each type, to
> give you an idea.  A lot of them do look like they are part of reflink
> operations though.
>
> > When speaking of parent pointers, without having looked into the details yet,
> > it seem the delayed operations we would want to log are operations that deal
> > with namespace changes, i.e.: link,unlink,rename.
> > The information needed to be logged for these ops is minimal.
> > Why do we need a general infrastructure for delayed attr operations?
> >
> > Thanks,
> > Amir.
> >
> Great question, this one goes back a ways.  I believe the train of logic
> we had is that because parent pointers also include the filename of the
> parent, its possible we can end up with really big attributes.  Which
> may run into a lot of block map/unmap activity for name space changes.
> We didnt want to end up with overly large transactions in the log, so we
> wanted to break them up by returning -EAGAIN where ever the transactions
> used to be rolled.  I'm pretty sure that covers a quick high level
> history of where we are now?  Did that answer your question?
>

Partly.
My question was like this:
It seems that your work is about implementing:
[intent to set xattr <new parent inode,gen,offset> <new name>]
[intent to remove xattr <old parent inode,gen,offset> <old name>]

While at a high level what the user really *intents* to do is:
[intent to link <inode> to <new parent inode>;<new name>]
[intent to unlink <inode> from <old parent inode>;<old name>]

I guess the log item sizes of the two variants is quite similar, so it
doesn't make much of a difference and deferred xattr ops are more
generic and may be used for other things in the future.

Another thing is that the transaction space required from directory
entry changes is (probably) already taken into account correctly
in the code, so there is no need to worry about deferred namespace
operations from that aspect, but from a pure design perspective,
if namespace operations become complex, *they* are the ones
that should be made into deferred operations.

Or maybe I am not reading the situations correctly at all...

Thanks,
Amir.
Chandan Rajendra Feb. 24, 2020, 8:31 a.m. UTC | #4
On Sunday, February 23, 2020 1:25 PM Amir Goldstein wrote: 
> On Sun, Feb 23, 2020 at 4:06 AM Allison Collins
> <allison.henderson@oracle.com> wrote:
> >
> > Hi all,
> >
> > This set is a subset of a larger series for delayed attributes. Which is
> > a subset of an even larger series, parent pointers. Delayed attributes
> > allow attribute operations (set and remove) to be logged and committed
> > in the same way that other delayed operations do. This allows more
> > complex operations (like parent pointers) to be broken up into multiple
> > smaller transactions. To do this, the existing attr operations must be
> > modified to operate as either a delayed operation or a inline operation
> > since older filesystems will not be able to use the new log entries.
> 
> High level question, before I dive into the series:
>

Hi Amir,

My 2 cents on this topic (Assuming "delayed operations" refer to "deferred
ops") ... 

> Which other "delayed operations" already exist?

static const struct xfs_defer_op_type *defer_op_types[] = {
        [XFS_DEFER_OPS_TYPE_BMAP]       = &xfs_bmap_update_defer_type,
        [XFS_DEFER_OPS_TYPE_REFCOUNT]   = &xfs_refcount_update_defer_type,
        [XFS_DEFER_OPS_TYPE_RMAP]       = &xfs_rmap_update_defer_type,
        [XFS_DEFER_OPS_TYPE_FREE]       = &xfs_extent_free_defer_type,
        [XFS_DEFER_OPS_TYPE_AGFL_FREE]  = &xfs_agfl_free_defer_type,
};


> I think delayed operations were added by Darrick to handle the growth of
> translation size due to reflink. Right? So I assume the existing delayed
> operations deal with block accounting.

IIRC, Deferred ops are meant to not violate the AG locking order.  If AG 'x'
metadata block[s] is locked then we can only lock metadata blocks of AGs
starting from 'x+1'. Transactions can return -EAGAIN when they detect that
they need to lock metadata blocks belonging to AG 'x-1' or less. In such a
case the transaction will be rolled during which the locks on metadata blocks
are given up.

> When speaking of parent pointers, without having looked into the details yet,
> it seem the delayed operations we would want to log are operations that deal
> with namespace changes, i.e.: link,unlink,rename.
> The information needed to be logged for these ops is minimal.
> Why do we need a general infrastructure for delayed attr operations?
> 
> Thanks,
> Amir.
> 
> 
> 
>
Allison Collins Feb. 24, 2020, 4:23 p.m. UTC | #5
On 2/23/20 11:30 PM, Amir Goldstein wrote:
> On Sun, Feb 23, 2020 at 6:02 PM Allison Collins
> <allison.henderson@oracle.com> wrote:
>>
>>
>>
>> On 2/23/20 12:55 AM, Amir Goldstein wrote:
>>> On Sun, Feb 23, 2020 at 4:06 AM Allison Collins
>>> <allison.henderson@oracle.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This set is a subset of a larger series for delayed attributes. Which is
>>>> a subset of an even larger series, parent pointers. Delayed attributes
>>>> allow attribute operations (set and remove) to be logged and committed
>>>> in the same way that other delayed operations do. This allows more
>>>> complex operations (like parent pointers) to be broken up into multiple
>>>> smaller transactions. To do this, the existing attr operations must be
>>>> modified to operate as either a delayed operation or a inline operation
>>>> since older filesystems will not be able to use the new log entries.
>>>
>>> High level question, before I dive into the series:
>>>
>>> Which other "delayed operations" already exist?
>>> I think delayed operations were added by Darrick to handle the growth of
>>> translation size due to reflink. Right? So I assume the existing delayed
>>> operations deal with block accounting.
>> Gosh, quite a few I think, but I'm not solid on what they all do.  If we
>> take a peek at XFS_LI_TYPE_DESC, theres an identifier for each type, to
>> give you an idea.  A lot of them do look like they are part of reflink
>> operations though.
>>
>>> When speaking of parent pointers, without having looked into the details yet,
>>> it seem the delayed operations we would want to log are operations that deal
>>> with namespace changes, i.e.: link,unlink,rename.
>>> The information needed to be logged for these ops is minimal.
>>> Why do we need a general infrastructure for delayed attr operations?
>>>
>>> Thanks,
>>> Amir.
>>>
>> Great question, this one goes back a ways.  I believe the train of logic
>> we had is that because parent pointers also include the filename of the
>> parent, its possible we can end up with really big attributes.  Which
>> may run into a lot of block map/unmap activity for name space changes.
>> We didnt want to end up with overly large transactions in the log, so we
>> wanted to break them up by returning -EAGAIN where ever the transactions
>> used to be rolled.  I'm pretty sure that covers a quick high level
>> history of where we are now?  Did that answer your question?
>>
> 
> Partly.
> My question was like this:
> It seems that your work is about implementing:
> [intent to set xattr <new parent inode,gen,offset> <new name>]
> [intent to remove xattr <old parent inode,gen,offset> <old name>]
> 
> While at a high level what the user really *intents* to do is:
> [intent to link <inode> to <new parent inode>;<new name>]
> [intent to unlink <inode> from <old parent inode>;<old name>]
> 
> I guess the log item sizes of the two variants is quite similar, so it
> doesn't make much of a difference and deferred xattr ops are more
> generic and may be used for other things in the future.
> 
> Another thing is that the transaction space required from directory
> entry changes is (probably) already taken into account correctly
> in the code, so there is no need to worry about deferred namespace
> operations from that aspect, but from a pure design perspective,
> if namespace operations become complex, *they* are the ones
> that should be made into deferred operations.
> 
> Or maybe I am not reading the situations correctly at all...
Ok, I think I understand what you're trying to say.  Would it help to 
explain then that setting or removing an attr becomes part of the 
namespace operations later?  When we get up into the parent pointer set, 
a lot of those patches add an attribute set or remove every time we 
link, unlink, rename etc.  Did that help answer your question?

Allison

> 
> Thanks,
> Amir.
>
Amir Goldstein Feb. 25, 2020, 5:53 a.m. UTC | #6
> Ok, I think I understand what you're trying to say.  Would it help to
> explain then that setting or removing an attr becomes part of the
> namespace operations later?  When we get up into the parent pointer set,
> a lot of those patches add an attribute set or remove every time we
> link, unlink, rename etc.  Did that help answer your question?
>

I will wait for the parent pointers series,

Thanks,
Amir.
Dave Chinner Feb. 25, 2020, 9:52 a.m. UTC | #7
On Sun, Feb 23, 2020 at 09:55:48AM +0200, Amir Goldstein wrote:
> On Sun, Feb 23, 2020 at 4:06 AM Allison Collins
> <allison.henderson@oracle.com> wrote:
> >
> > Hi all,
> >
> > This set is a subset of a larger series for delayed attributes. Which is
> > a subset of an even larger series, parent pointers. Delayed attributes
> > allow attribute operations (set and remove) to be logged and committed
> > in the same way that other delayed operations do. This allows more
> > complex operations (like parent pointers) to be broken up into multiple
> > smaller transactions. To do this, the existing attr operations must be
> > modified to operate as either a delayed operation or a inline operation
> > since older filesystems will not be able to use the new log entries.
> 
> High level question, before I dive into the series:
> 
> Which other "delayed operations" already exist?

See Chandan's answer :P

> I think delayed operations were added by Darrick to handle the growth of
> translation size due to reflink. Right? So I assume the existing delayed
> operations deal with block accounting.

No, they are intended to allow atomic, recoverable multi-transaction
operations. They grew out of this:

https://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations

which was essentially an generalisation of the EFI/EFD intent
logging that has existed in XFS for 20 years.

Essentially, it is a mechanism of chaining intent operations to
ensure that recover will restart the operation at the point the
system failed so that once the operation is started (i.e. first
intent is logged to the journal) the entire operation is always
completed regardless of whether the system crashes or not.

> When speaking of parent pointers, without having looked into the details yet,
> it seem the delayed operations we would want to log are operations that deal
> with namespace changes, i.e.: link,unlink,rename.
> The information needed to be logged for these ops is minimal.

Not really. the parent pointers are held in attributes, so parent
pointers are effectively adding an attribute creation to every inode
allocation and an attribute modification to every directory
modification. And, well, when an inode has 100 million hard links,
it's going to have 100 million parent pointer attributes. Modifying
a link is then a major operation, and Chandan has done a great job
in analysing the attr btree to see if there are scalability issues
that will be exposed by this sort of attribute usage....

> Why do we need a general infrastructure for delayed attr operations?

These have to be done atomically with the create/unlink/rename/etc
and to include attribute modification in those transaction
reservations blows the size of them out massively (especially
rename!). By converting these operations to use defered operations
to add the parent pointer to the inode, we no longer need to
increase the log reservation for the operations (because the attr
reservation is usually smaller than the directory reservation), and
it is guaranteed to be atomic with the directory modification. i.e.
parent pointers never get out of sync, even when the system crashes.

Hence having attributes modified as a series of individual
operations chained together into an atomic whole via intents is a
pre-requisite for updating attributes atomically within directory
modification operations.

Cheers,

Dave.