diff mbox series

[5/5] xfs: reverse search directory freespace indexes

Message ID 20190829063042.22902-6-david@fromorbit.com (mailing list archive)
State Accepted
Headers show
Series xfs: speed up large directory modifications | expand

Commit Message

Dave Chinner Aug. 29, 2019, 6:30 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When a directory is growing rapidly, new blocks tend to get added at
the end of the directory. These end up at the end of the freespace
index, and when the directory gets large finding these new
freespaces gets expensive. The code does a linear search across the
frespace index from the first block in the directory to the last,
hence meaning the newly added space is the last index searched.

Instead, do a reverse order index search, starting from the last
block and index in the freespace index. This makes most lookups for
free space on rapidly growing directories O(1) instead of O(N), but
should not have any impact on random insert workloads because the
average search length is the same regardless of which end of the
array we start at.

The result is a major improvement in large directory grow rates:

		create time(sec) / rate (files/s)
 File count     vanilla             Prev commit		Patched
  10k	      0.41 / 24.3k	   0.42 / 23.8k       0.41 / 24.3k
  20k	      0.74 / 27.0k	   0.76 / 26.3k       0.75 / 26.7k
 100k	      3.81 / 26.4k	   3.47 / 28.8k       3.27 / 30.6k
 200k	      8.58 / 23.3k	   7.19 / 27.8k       6.71 / 29.8k
   1M	     85.69 / 11.7k	  48.53 / 20.6k      37.67 / 26.5k
   2M	    280.31 /  7.1k	 130.14 / 15.3k      79.55 / 25.2k
  10M	   3913.26 /  2.5k                          552.89 / 18.1k

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_dir2_node.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig Aug. 29, 2019, 8:23 a.m. UTC | #1
On Thu, Aug 29, 2019 at 04:30:42PM +1000, Dave Chinner wrote:
> 		create time(sec) / rate (files/s)
>  File count     vanilla             Prev commit		Patched
>   10k	      0.41 / 24.3k	   0.42 / 23.8k       0.41 / 24.3k
>   20k	      0.74 / 27.0k	   0.76 / 26.3k       0.75 / 26.7k
>  100k	      3.81 / 26.4k	   3.47 / 28.8k       3.27 / 30.6k
>  200k	      8.58 / 23.3k	   7.19 / 27.8k       6.71 / 29.8k
>    1M	     85.69 / 11.7k	  48.53 / 20.6k      37.67 / 26.5k
>    2M	    280.31 /  7.1k	 130.14 / 15.3k      79.55 / 25.2k
>   10M	   3913.26 /  2.5k                          552.89 / 18.1k

Impressive!

> Signed-Off-By: Dave Chinner <dchinner@redhat.com>

FYI, the Off here should be all lower case.  Patch 2 actually has the
same issue, but I only noticed it here.

> @@ -1781,6 +1782,9 @@ xfs_dir2_node_find_freeblk(
>  		 */
>  		ifbno = fblk->blkno;
>  		fbno = ifbno;
> +		xfs_trans_brelse(tp, fbp);
> +		fbp = NULL;
> +		fblk->bp = NULL;

Hmm.  Doesn't this actually belong into the previous patch?
Dave Chinner Aug. 29, 2019, 9:14 a.m. UTC | #2
On Thu, Aug 29, 2019 at 01:23:10AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 29, 2019 at 04:30:42PM +1000, Dave Chinner wrote:
> > 		create time(sec) / rate (files/s)
> >  File count     vanilla             Prev commit		Patched
> >   10k	      0.41 / 24.3k	   0.42 / 23.8k       0.41 / 24.3k
> >   20k	      0.74 / 27.0k	   0.76 / 26.3k       0.75 / 26.7k
> >  100k	      3.81 / 26.4k	   3.47 / 28.8k       3.27 / 30.6k
> >  200k	      8.58 / 23.3k	   7.19 / 27.8k       6.71 / 29.8k
> >    1M	     85.69 / 11.7k	  48.53 / 20.6k      37.67 / 26.5k
> >    2M	    280.31 /  7.1k	 130.14 / 15.3k      79.55 / 25.2k
> >   10M	   3913.26 /  2.5k                          552.89 / 18.1k
> 
> Impressive!

Yeah, i think the  drop-off in performance at 10M inodes is caused
by running out of RAM at about 6M inodes, and so there's extra CPU
time spent in direct memory reclaim after than so the single
threaded create performance is lower from as a result.

> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> 
> FYI, the Off here should be all lower case.  Patch 2 actually has the
> same issue, but I only noticed it here.

Yeah, 3 of 5 patches are like that. No idea how - I use macros for
all these sobs and rvbs and they all use lower case....

> > @@ -1781,6 +1782,9 @@ xfs_dir2_node_find_freeblk(
> >  		 */
> >  		ifbno = fblk->blkno;
> >  		fbno = ifbno;
> > +		xfs_trans_brelse(tp, fbp);
> > +		fbp = NULL;
> > +		fblk->bp = NULL;
> 
> Hmm.  Doesn't this actually belong into the previous patch?

Yup, I'll move them.

-Dave.
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index c6a1d1cc638f..d8abbcbde055 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1746,6 +1746,7 @@  xfs_dir2_node_find_freeblk(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_trans	*tp = args->trans;
 	struct xfs_buf		*fbp = NULL;
+	xfs_dir2_db_t		firstfbno;
 	xfs_dir2_db_t		lastfbno;
 	xfs_dir2_db_t		ifbno = -1;
 	xfs_dir2_db_t		dbno = -1;
@@ -1781,6 +1782,9 @@  xfs_dir2_node_find_freeblk(
 		 */
 		ifbno = fblk->blkno;
 		fbno = ifbno;
+		xfs_trans_brelse(tp, fbp);
+		fbp = NULL;
+		fblk->bp = NULL;
 	}
 
 	/*
@@ -1792,16 +1796,15 @@  xfs_dir2_node_find_freeblk(
 	if (error)
 		return error;
 	lastfbno = xfs_dir2_da_to_db(args->geo, (xfs_dablk_t)fo);
-
-	/* If we haven't get a search start block, set it now */
-	if (fbno == -1)
-		fbno = xfs_dir2_byte_to_db(args->geo, XFS_DIR2_FREE_OFFSET);
+	firstfbno = xfs_dir2_byte_to_db(args->geo, XFS_DIR2_FREE_OFFSET);
 
 	/*
 	 * While we haven't identified a data block, search the freeblock
-	 * data for a data block with enough free space in it.
+	 * data for a good data block. Do a reverse order search, as growing
+	 * directories will put new blocks with free space at the end of the
+	 * free space index.
 	 */
-	for ( ; fbno < lastfbno; fbno++) {
+	for (fbno = lastfbno - 1; fbno >= firstfbno; fbno--) {
 		/* If it's ifbno we already looked at it. */
 		if (fbno == ifbno)
 			continue;
@@ -1819,19 +1822,18 @@  xfs_dir2_node_find_freeblk(
 		if (!fbp)
 			continue;
 
-		findex = 0;
 		free = fbp->b_addr;
 		bests = dp->d_ops->free_bests_p(free);
 		dp->d_ops->free_hdr_from_disk(&freehdr, free);
 
 		/* Scan the free entry array for a large enough free space. */
-		do {
+		for (findex = freehdr.nvalid - 1; findex >= 0; findex--) {
 			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
 			    be16_to_cpu(bests[findex]) >= length) {
 				dbno = freehdr.firstdb + findex;
 				goto found_block;
 			}
-		} while (++findex < freehdr.nvalid);
+		};
 
 		/* Didn't find free space, go on to next free block */
 		xfs_trans_brelse(tp, fbp);