xfsdocs: capture some information about dirs vs. attrs and how they use dabtrees
diff mbox series

Message ID 20200408232753.GC6741@magnolia
State Superseded
Headers show
Series
  • xfsdocs: capture some information about dirs vs. attrs and how they use dabtrees
Related show

Commit Message

Darrick J. Wong April 8, 2020, 11:27 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Dave and I had a short discussion about whether or not xattr trees
needed to have the same free space tracking that directories have, and
a comparison of how each of the two metadata types interact with
dabtrees resulted.  I've reworked this a bit to make it flow better as a
book chapter, so here we go.

Original-mail: https://lore.kernel.org/linux-xfs/20200404085203.1908-1-chandanrlinux@gmail.com/T/#mdd12ad06cf5d635772cc38946fc5b22e349e136f
Originally-from: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .../extended_attributes.asciidoc                   |   49 ++++++++++++++++++++
 1 file changed, 49 insertions(+)

Comments

Dave Chinner April 9, 2020, 12:16 a.m. UTC | #1
On Wed, Apr 08, 2020 at 04:27:53PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Dave and I had a short discussion about whether or not xattr trees
> needed to have the same free space tracking that directories have, and
> a comparison of how each of the two metadata types interact with
> dabtrees resulted.  I've reworked this a bit to make it flow better as a
> book chapter, so here we go.
> 
> Original-mail: https://lore.kernel.org/linux-xfs/20200404085203.1908-1-chandanrlinux@gmail.com/T/#mdd12ad06cf5d635772cc38946fc5b22e349e136f
> Originally-from: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Couple of things.

We are talking about btrees and where the record data is being
stored (internal or external). Hence I think it makes sense to refer
to "attribute records" and "directory records" (or "dirent records")
rather than "attributes" and "directory entries"...

"leaves" -> "leaf nodes"

> ---
>  .../extended_attributes.asciidoc                   |   49 ++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> index 99f7b35..d61c649 100644
> --- a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> +++ b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> @@ -910,3 +910,52 @@ Log sequence number of the last write to this block.
>  
>  Filesystems formatted prior to v5 do not have this header in the remote block.
>  Value data begins immediately at offset zero.
> +
> +== Key Differences Between Directories and Extended Attributes
> +
> +Though directories and extended attributes can take advantage of the same
> +variable length record btree structures (i.e. the dabtree) to map name hashes
> +to disk blocks, there are major differences in the ways that each of those
> +users embed the btree within the information that they are storing.
> +
> +Directory blocks require external free space tracking because the directory
> +blocks are not part of the dabtree itself.  The dabtree leaves for a directory
> +map name hashes to external directory data blocks.  Extended attributes, on

"The dabtree leaves for ...." implies it is going somewhere, not
that you are talking about leaf nodes. :) Perhaps:

"The directory dabtree leaf nodes contain a mapping between name
hash and the location of the dirent record in the external directory
data blocks."

> +the other hand, store all of the attributes in the leaves of the dabtree.

"... store the attribute records directly in the dabtree leaf
nodes."

> +
> +When we add or remove an extended attribute in the dabtree, we split or merge
> +leaves of the tree based on where the name hash index tells us a leaf needs to
> +be inserted into or removed.  In other words, we make space available or
> +collapse sparse leaves of the dabtree as a side effect of inserting or
> +removing attributes.
> +
> +The directory structure is very different.  Directory entries cannot change
> +location because each entry's logical offset into the directory data segment
> +is used as the readdir/seekdir/telldir cookie, and the cookie is required to
> +be stable for the life of the entry.  Therefore, we cannot store directory
> +entries in the leaves of a dabtree (which is indexed in hash order) because

The userspace readdir/seekdir/telldir directory cookie API places a
requirement on the directory structure that dirent record cookie
cannot change for the life of the dirent record. We use the dirent
record's logical offset into the directory data segment for that
cookie, and hence the dirent record cannot change location.
Therefore, we cannot store directory records in the leaf nodes of
the dabtree....

> +the offset into the tree would change as other entries are inserted and
> +removed.  Hence when we remove directory entries, we must leave holes in the
> +data segment so the rest of the entries do not move.
> +
> +The directory name hash index (the dabtree bit) is held in the second
> +directory segment.  Because the dabtree only stores pointers to directory
> +entries in the (first) data segment, there is no need to leave holes in the
> +dabtree itself.  The dabtree merges or splits leaves as required as pointers
> +to the directory data segment are added or removed.  The dabtree itself needs
> +no free space tracking.
> +
> +When we go to add a directory entry, we need to find the best-fitting free

s/go to//

> +space in the directory data segment to turn into the new entry.  This requires
> +a free space index for the directory data segment.  The free space index is
> +held in the third directory segment.  Once we've used the free space index to
> +find the block with that best free space, we modify the directory data block
> +and update the dabtree to point the name hash at the new entry.
> +
> +In other words, the requirement for a free space map in the directory
> +structure results from storing the directory entry data externally to the
> +dabtree.  Extended atttributes are stored directly in the leaves of the

dabtree leaf nodes

> +dabtree (except for remote attributes which can be anywhere in the attr fork
> +address space) and do not need external free space tracking to determine where
> +to best insert them.  As a result, extended attributes exhibit nearly perfect
> +scaling until we run out of memory.

Thanks for doing this, Darrick!

-Dave.
Darrick J. Wong April 13, 2020, 7:37 p.m. UTC | #2
On Thu, Apr 09, 2020 at 10:16:08AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 04:27:53PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Dave and I had a short discussion about whether or not xattr trees
> > needed to have the same free space tracking that directories have, and
> > a comparison of how each of the two metadata types interact with
> > dabtrees resulted.  I've reworked this a bit to make it flow better as a
> > book chapter, so here we go.
> > 
> > Original-mail: https://lore.kernel.org/linux-xfs/20200404085203.1908-1-chandanrlinux@gmail.com/T/#mdd12ad06cf5d635772cc38946fc5b22e349e136f
> > Originally-from: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Couple of things.
> 
> We are talking about btrees and where the record data is being
> stored (internal or external). Hence I think it makes sense to refer
> to "attribute records" and "directory records" (or "dirent records")
> rather than "attributes" and "directory entries"...

Ok, I'll clean that up 

> "leaves" -> "leaf nodes"

Fixed.

> > ---
> >  .../extended_attributes.asciidoc                   |   49 ++++++++++++++++++++
> >  1 file changed, 49 insertions(+)
> > 
> > diff --git a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > index 99f7b35..d61c649 100644
> > --- a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > +++ b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > @@ -910,3 +910,52 @@ Log sequence number of the last write to this block.
> >  
> >  Filesystems formatted prior to v5 do not have this header in the remote block.
> >  Value data begins immediately at offset zero.
> > +
> > +== Key Differences Between Directories and Extended Attributes
> > +
> > +Though directories and extended attributes can take advantage of the same
> > +variable length record btree structures (i.e. the dabtree) to map name hashes
> > +to disk blocks, there are major differences in the ways that each of those
> > +users embed the btree within the information that they are storing.
> > +
> > +Directory blocks require external free space tracking because the directory
> > +blocks are not part of the dabtree itself.  The dabtree leaves for a directory
> > +map name hashes to external directory data blocks.  Extended attributes, on
> 
> "The dabtree leaves for ...." implies it is going somewhere, not
> that you are talking about leaf nodes. :) Perhaps:
> 
> "The directory dabtree leaf nodes contain a mapping between name
> hash and the location of the dirent record in the external directory
> data blocks."

<nod>

> > +the other hand, store all of the attributes in the leaves of the dabtree.
> 
> "... store the attribute records directly in the dabtree leaf
> nodes."

<nod>

> > +
> > +When we add or remove an extended attribute in the dabtree, we split or merge
> > +leaves of the tree based on where the name hash index tells us a leaf needs to
> > +be inserted into or removed.  In other words, we make space available or
> > +collapse sparse leaves of the dabtree as a side effect of inserting or
> > +removing attributes.
> > +
> > +The directory structure is very different.  Directory entries cannot change
> > +location because each entry's logical offset into the directory data segment
> > +is used as the readdir/seekdir/telldir cookie, and the cookie is required to
> > +be stable for the life of the entry.  Therefore, we cannot store directory
> > +entries in the leaves of a dabtree (which is indexed in hash order) because
> 
> The userspace readdir/seekdir/telldir directory cookie API places a
> requirement on the directory structure that dirent record cookie
> cannot change for the life of the dirent record. We use the dirent
> record's logical offset into the directory data segment for that
> cookie, and hence the dirent record cannot change location.
> Therefore, we cannot store directory records in the leaf nodes of
> the dabtree....

Ok, I'll massage that in. :)

> > +the offset into the tree would change as other entries are inserted and
> > +removed.  Hence when we remove directory entries, we must leave holes in the
> > +data segment so the rest of the entries do not move.
> > +
> > +The directory name hash index (the dabtree bit) is held in the second
> > +directory segment.  Because the dabtree only stores pointers to directory
> > +entries in the (first) data segment, there is no need to leave holes in the
> > +dabtree itself.  The dabtree merges or splits leaves as required as pointers
> > +to the directory data segment are added or removed.  The dabtree itself needs
> > +no free space tracking.
> > +
> > +When we go to add a directory entry, we need to find the best-fitting free
> 
> s/go to//

Fixed.

> > +space in the directory data segment to turn into the new entry.  This requires
> > +a free space index for the directory data segment.  The free space index is
> > +held in the third directory segment.  Once we've used the free space index to
> > +find the block with that best free space, we modify the directory data block
> > +and update the dabtree to point the name hash at the new entry.
> > +
> > +In other words, the requirement for a free space map in the directory
> > +structure results from storing the directory entry data externally to the
> > +dabtree.  Extended atttributes are stored directly in the leaves of the
> 
> dabtree leaf nodes

Fixed.

> > +dabtree (except for remote attributes which can be anywhere in the attr fork
> > +address space) and do not need external free space tracking to determine where
> > +to best insert them.  As a result, extended attributes exhibit nearly perfect
> > +scaling until we run out of memory.
> 
> Thanks for doing this, Darrick!

NP.  v2 is on its way.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Patch
diff mbox series

diff --git a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
index 99f7b35..d61c649 100644
--- a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
+++ b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
@@ -910,3 +910,52 @@  Log sequence number of the last write to this block.
 
 Filesystems formatted prior to v5 do not have this header in the remote block.
 Value data begins immediately at offset zero.
+
+== Key Differences Between Directories and Extended Attributes
+
+Though directories and extended attributes can take advantage of the same
+variable length record btree structures (i.e. the dabtree) to map name hashes
+to disk blocks, there are major differences in the ways that each of those
+users embed the btree within the information that they are storing.
+
+Directory blocks require external free space tracking because the directory
+blocks are not part of the dabtree itself.  The dabtree leaves for a directory
+map name hashes to external directory data blocks.  Extended attributes, on
+the other hand, store all of the attributes in the leaves of the dabtree.
+
+When we add or remove an extended attribute in the dabtree, we split or merge
+leaves of the tree based on where the name hash index tells us a leaf needs to
+be inserted into or removed.  In other words, we make space available or
+collapse sparse leaves of the dabtree as a side effect of inserting or
+removing attributes.
+
+The directory structure is very different.  Directory entries cannot change
+location because each entry's logical offset into the directory data segment
+is used as the readdir/seekdir/telldir cookie, and the cookie is required to
+be stable for the life of the entry.  Therefore, we cannot store directory
+entries in the leaves of a dabtree (which is indexed in hash order) because
+the offset into the tree would change as other entries are inserted and
+removed.  Hence when we remove directory entries, we must leave holes in the
+data segment so the rest of the entries do not move.
+
+The directory name hash index (the dabtree bit) is held in the second
+directory segment.  Because the dabtree only stores pointers to directory
+entries in the (first) data segment, there is no need to leave holes in the
+dabtree itself.  The dabtree merges or splits leaves as required as pointers
+to the directory data segment are added or removed.  The dabtree itself needs
+no free space tracking.
+
+When we go to add a directory entry, we need to find the best-fitting free
+space in the directory data segment to turn into the new entry.  This requires
+a free space index for the directory data segment.  The free space index is
+held in the third directory segment.  Once we've used the free space index to
+find the block with that best free space, we modify the directory data block
+and update the dabtree to point the name hash at the new entry.
+
+In other words, the requirement for a free space map in the directory
+structure results from storing the directory entry data externally to the
+dabtree.  Extended atttributes are stored directly in the leaves of the
+dabtree (except for remote attributes which can be anywhere in the attr fork
+address space) and do not need external free space tracking to determine where
+to best insert them.  As a result, extended attributes exhibit nearly perfect
+scaling until we run out of memory.