Message ID | 20200223020611.1802-1-allison.henderson@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | xfs: Delayed Ready Attrs | expand |
On Sun, Feb 23, 2020 at 4:06 AM Allison Collins <allison.henderson@oracle.com> wrote: > > Hi all, > > This set is a subset of a larger series for delayed attributes. Which is > a subset of an even larger series, parent pointers. Delayed attributes > allow attribute operations (set and remove) to be logged and committed > in the same way that other delayed operations do. This allows more > complex operations (like parent pointers) to be broken up into multiple > smaller transactions. To do this, the existing attr operations must be > modified to operate as either a delayed operation or a inline operation > since older filesystems will not be able to use the new log entries. High level question, before I dive into the series: Which other "delayed operations" already exist? I think delayed operations were added by Darrick to handle the growth of translation size due to reflink. Right? So I assume the existing delayed operations deal with block accounting. When speaking of parent pointers, without having looked into the details yet, it seem the delayed operations we would want to log are operations that deal with namespace changes, i.e.: link,unlink,rename. The information needed to be logged for these ops is minimal. Why do we need a general infrastructure for delayed attr operations? Thanks, Amir.
On 2/23/20 12:55 AM, Amir Goldstein wrote: > On Sun, Feb 23, 2020 at 4:06 AM Allison Collins > <allison.henderson@oracle.com> wrote: >> >> Hi all, >> >> This set is a subset of a larger series for delayed attributes. Which is >> a subset of an even larger series, parent pointers. Delayed attributes >> allow attribute operations (set and remove) to be logged and committed >> in the same way that other delayed operations do. This allows more >> complex operations (like parent pointers) to be broken up into multiple >> smaller transactions. To do this, the existing attr operations must be >> modified to operate as either a delayed operation or a inline operation >> since older filesystems will not be able to use the new log entries. > > High level question, before I dive into the series: > > Which other "delayed operations" already exist? > I think delayed operations were added by Darrick to handle the growth of > translation size due to reflink. Right? So I assume the existing delayed > operations deal with block accounting. Gosh, quite a few I think, but I'm not solid on what they all do. If we take a peek at XFS_LI_TYPE_DESC, theres an identifier for each type, to give you an idea. A lot of them do look like they are part of reflink operations though. > When speaking of parent pointers, without having looked into the details yet, > it seem the delayed operations we would want to log are operations that deal > with namespace changes, i.e.: link,unlink,rename. > The information needed to be logged for these ops is minimal. > Why do we need a general infrastructure for delayed attr operations? > > Thanks, > Amir. > Great question, this one goes back a ways. I believe the train of logic we had is that because parent pointers also include the filename of the parent, its possible we can end up with really big attributes. Which may run into a lot of block map/unmap activity for name space changes. We didnt want to end up with overly large transactions in the log, so we wanted to break them up by returning -EAGAIN where ever the transactions used to be rolled. I'm pretty sure that covers a quick high level history of where we are now? Did that answer your question? Allison
On Sun, Feb 23, 2020 at 6:02 PM Allison Collins <allison.henderson@oracle.com> wrote: > > > > On 2/23/20 12:55 AM, Amir Goldstein wrote: > > On Sun, Feb 23, 2020 at 4:06 AM Allison Collins > > <allison.henderson@oracle.com> wrote: > >> > >> Hi all, > >> > >> This set is a subset of a larger series for delayed attributes. Which is > >> a subset of an even larger series, parent pointers. Delayed attributes > >> allow attribute operations (set and remove) to be logged and committed > >> in the same way that other delayed operations do. This allows more > >> complex operations (like parent pointers) to be broken up into multiple > >> smaller transactions. To do this, the existing attr operations must be > >> modified to operate as either a delayed operation or a inline operation > >> since older filesystems will not be able to use the new log entries. > > > > High level question, before I dive into the series: > > > > Which other "delayed operations" already exist? > > I think delayed operations were added by Darrick to handle the growth of > > translation size due to reflink. Right? So I assume the existing delayed > > operations deal with block accounting. > Gosh, quite a few I think, but I'm not solid on what they all do. If we > take a peek at XFS_LI_TYPE_DESC, theres an identifier for each type, to > give you an idea. A lot of them do look like they are part of reflink > operations though. > > > When speaking of parent pointers, without having looked into the details yet, > > it seem the delayed operations we would want to log are operations that deal > > with namespace changes, i.e.: link,unlink,rename. > > The information needed to be logged for these ops is minimal. > > Why do we need a general infrastructure for delayed attr operations? > > > > Thanks, > > Amir. > > > Great question, this one goes back a ways. I believe the train of logic > we had is that because parent pointers also include the filename of the > parent, its possible we can end up with really big attributes. Which > may run into a lot of block map/unmap activity for name space changes. > We didnt want to end up with overly large transactions in the log, so we > wanted to break them up by returning -EAGAIN where ever the transactions > used to be rolled. I'm pretty sure that covers a quick high level > history of where we are now? Did that answer your question? > Partly. My question was like this: It seems that your work is about implementing: [intent to set xattr <new parent inode,gen,offset> <new name>] [intent to remove xattr <old parent inode,gen,offset> <old name>] While at a high level what the user really *intents* to do is: [intent to link <inode> to <new parent inode>;<new name>] [intent to unlink <inode> from <old parent inode>;<old name>] I guess the log item sizes of the two variants is quite similar, so it doesn't make much of a difference and deferred xattr ops are more generic and may be used for other things in the future. Another thing is that the transaction space required from directory entry changes is (probably) already taken into account correctly in the code, so there is no need to worry about deferred namespace operations from that aspect, but from a pure design perspective, if namespace operations become complex, *they* are the ones that should be made into deferred operations. Or maybe I am not reading the situations correctly at all... Thanks, Amir.
On Sunday, February 23, 2020 1:25 PM Amir Goldstein wrote: > On Sun, Feb 23, 2020 at 4:06 AM Allison Collins > <allison.henderson@oracle.com> wrote: > > > > Hi all, > > > > This set is a subset of a larger series for delayed attributes. Which is > > a subset of an even larger series, parent pointers. Delayed attributes > > allow attribute operations (set and remove) to be logged and committed > > in the same way that other delayed operations do. This allows more > > complex operations (like parent pointers) to be broken up into multiple > > smaller transactions. To do this, the existing attr operations must be > > modified to operate as either a delayed operation or a inline operation > > since older filesystems will not be able to use the new log entries. > > High level question, before I dive into the series: > Hi Amir, My 2 cents on this topic (Assuming "delayed operations" refer to "deferred ops") ... > Which other "delayed operations" already exist? static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_BMAP] = &xfs_bmap_update_defer_type, [XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, }; > I think delayed operations were added by Darrick to handle the growth of > translation size due to reflink. Right? So I assume the existing delayed > operations deal with block accounting. IIRC, Deferred ops are meant to not violate the AG locking order. If AG 'x' metadata block[s] is locked then we can only lock metadata blocks of AGs starting from 'x+1'. Transactions can return -EAGAIN when they detect that they need to lock metadata blocks belonging to AG 'x-1' or less. In such a case the transaction will be rolled during which the locks on metadata blocks are given up. > When speaking of parent pointers, without having looked into the details yet, > it seem the delayed operations we would want to log are operations that deal > with namespace changes, i.e.: link,unlink,rename. > The information needed to be logged for these ops is minimal. > Why do we need a general infrastructure for delayed attr operations? > > Thanks, > Amir. > > > >
On 2/23/20 11:30 PM, Amir Goldstein wrote: > On Sun, Feb 23, 2020 at 6:02 PM Allison Collins > <allison.henderson@oracle.com> wrote: >> >> >> >> On 2/23/20 12:55 AM, Amir Goldstein wrote: >>> On Sun, Feb 23, 2020 at 4:06 AM Allison Collins >>> <allison.henderson@oracle.com> wrote: >>>> >>>> Hi all, >>>> >>>> This set is a subset of a larger series for delayed attributes. Which is >>>> a subset of an even larger series, parent pointers. Delayed attributes >>>> allow attribute operations (set and remove) to be logged and committed >>>> in the same way that other delayed operations do. This allows more >>>> complex operations (like parent pointers) to be broken up into multiple >>>> smaller transactions. To do this, the existing attr operations must be >>>> modified to operate as either a delayed operation or a inline operation >>>> since older filesystems will not be able to use the new log entries. >>> >>> High level question, before I dive into the series: >>> >>> Which other "delayed operations" already exist? >>> I think delayed operations were added by Darrick to handle the growth of >>> translation size due to reflink. Right? So I assume the existing delayed >>> operations deal with block accounting. >> Gosh, quite a few I think, but I'm not solid on what they all do. If we >> take a peek at XFS_LI_TYPE_DESC, theres an identifier for each type, to >> give you an idea. A lot of them do look like they are part of reflink >> operations though. >> >>> When speaking of parent pointers, without having looked into the details yet, >>> it seem the delayed operations we would want to log are operations that deal >>> with namespace changes, i.e.: link,unlink,rename. >>> The information needed to be logged for these ops is minimal. >>> Why do we need a general infrastructure for delayed attr operations? >>> >>> Thanks, >>> Amir. >>> >> Great question, this one goes back a ways. I believe the train of logic >> we had is that because parent pointers also include the filename of the >> parent, its possible we can end up with really big attributes. Which >> may run into a lot of block map/unmap activity for name space changes. >> We didnt want to end up with overly large transactions in the log, so we >> wanted to break them up by returning -EAGAIN where ever the transactions >> used to be rolled. I'm pretty sure that covers a quick high level >> history of where we are now? Did that answer your question? >> > > Partly. > My question was like this: > It seems that your work is about implementing: > [intent to set xattr <new parent inode,gen,offset> <new name>] > [intent to remove xattr <old parent inode,gen,offset> <old name>] > > While at a high level what the user really *intents* to do is: > [intent to link <inode> to <new parent inode>;<new name>] > [intent to unlink <inode> from <old parent inode>;<old name>] > > I guess the log item sizes of the two variants is quite similar, so it > doesn't make much of a difference and deferred xattr ops are more > generic and may be used for other things in the future. > > Another thing is that the transaction space required from directory > entry changes is (probably) already taken into account correctly > in the code, so there is no need to worry about deferred namespace > operations from that aspect, but from a pure design perspective, > if namespace operations become complex, *they* are the ones > that should be made into deferred operations. > > Or maybe I am not reading the situations correctly at all... Ok, I think I understand what you're trying to say. Would it help to explain then that setting or removing an attr becomes part of the namespace operations later? When we get up into the parent pointer set, a lot of those patches add an attribute set or remove every time we link, unlink, rename etc. Did that help answer your question? Allison > > Thanks, > Amir. >
> Ok, I think I understand what you're trying to say. Would it help to > explain then that setting or removing an attr becomes part of the > namespace operations later? When we get up into the parent pointer set, > a lot of those patches add an attribute set or remove every time we > link, unlink, rename etc. Did that help answer your question? > I will wait for the parent pointers series, Thanks, Amir.
On Sun, Feb 23, 2020 at 09:55:48AM +0200, Amir Goldstein wrote: > On Sun, Feb 23, 2020 at 4:06 AM Allison Collins > <allison.henderson@oracle.com> wrote: > > > > Hi all, > > > > This set is a subset of a larger series for delayed attributes. Which is > > a subset of an even larger series, parent pointers. Delayed attributes > > allow attribute operations (set and remove) to be logged and committed > > in the same way that other delayed operations do. This allows more > > complex operations (like parent pointers) to be broken up into multiple > > smaller transactions. To do this, the existing attr operations must be > > modified to operate as either a delayed operation or a inline operation > > since older filesystems will not be able to use the new log entries. > > High level question, before I dive into the series: > > Which other "delayed operations" already exist? See Chandan's answer :P > I think delayed operations were added by Darrick to handle the growth of > translation size due to reflink. Right? So I assume the existing delayed > operations deal with block accounting. No, they are intended to allow atomic, recoverable multi-transaction operations. They grew out of this: https://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations which was essentially an generalisation of the EFI/EFD intent logging that has existed in XFS for 20 years. Essentially, it is a mechanism of chaining intent operations to ensure that recover will restart the operation at the point the system failed so that once the operation is started (i.e. first intent is logged to the journal) the entire operation is always completed regardless of whether the system crashes or not. > When speaking of parent pointers, without having looked into the details yet, > it seem the delayed operations we would want to log are operations that deal > with namespace changes, i.e.: link,unlink,rename. > The information needed to be logged for these ops is minimal. Not really. the parent pointers are held in attributes, so parent pointers are effectively adding an attribute creation to every inode allocation and an attribute modification to every directory modification. And, well, when an inode has 100 million hard links, it's going to have 100 million parent pointer attributes. Modifying a link is then a major operation, and Chandan has done a great job in analysing the attr btree to see if there are scalability issues that will be exposed by this sort of attribute usage.... > Why do we need a general infrastructure for delayed attr operations? These have to be done atomically with the create/unlink/rename/etc and to include attribute modification in those transaction reservations blows the size of them out massively (especially rename!). By converting these operations to use defered operations to add the parent pointer to the inode, we no longer need to increase the log reservation for the operations (because the attr reservation is usually smaller than the directory reservation), and it is guaranteed to be atomic with the directory modification. i.e. parent pointers never get out of sync, even when the system crashes. Hence having attributes modified as a series of individual operations chained together into an atomic whole via intents is a pre-requisite for updating attributes atomically within directory modification operations. Cheers, Dave.