[4/5] xfs: speed up directory bestfree block scanning
diff mbox series

Message ID 20190829063042.22902-5-david@fromorbit.com
State New
Headers show
Series
  • xfs: speed up large directory modifications
Related show

Commit Message

Dave Chinner Aug. 29, 2019, 6:30 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When running a "create millions inodes in a directory" test
recently, I noticed we were spending a huge amount of time
converting freespace block headers from disk format to in-memory
format:

 31.47%  [kernel]  [k] xfs_dir2_node_addname
 17.86%  [kernel]  [k] xfs_dir3_free_hdr_from_disk
  3.55%  [kernel]  [k] xfs_dir3_free_bests_p

We shouldn't be hitting the best free block scanning code so hard
when doing sequential directory creates, and it turns out there's
a highly suboptimal loop searching the the best free array in
the freespace block - it decodes the block header before checking
each entry inside a loop, instead of decoding the header once before
running the entry search loop.

This makes a massive difference to create rates. Profile now looks
like this:

  13.15%  [kernel]  [k] xfs_dir2_node_addname
   3.52%  [kernel]  [k] xfs_dir3_leaf_check_int
   3.11%  [kernel]  [k] xfs_log_commit_cil

And the wall time/average file create rate differences are
just as stark:

		create time(sec) / rate (files/s)
File count	     vanilla		    patched
  10k		   0.41 / 24.3k		   0.42 / 23.8k
  20k		   0.74	/ 27.0k		   0.76 / 26.3k
 100k		   3.81	/ 26.4k		   3.47 / 28.8k
 200k		   8.58	/ 23.3k		   7.19 / 27.8k
   1M		  85.69	/ 11.7k		  48.53 / 20.6k
   2M		 280.31	/  7.1k		 130.14 / 15.3k

The larger the directory, the bigger the performance improvement.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_dir2_node.c | 101 +++++++++++++---------------------
 1 file changed, 37 insertions(+), 64 deletions(-)

Comments

Christoph Hellwig Aug. 29, 2019, 8:18 a.m. UTC | #1
On Thu, Aug 29, 2019 at 04:30:41PM +1000, Dave Chinner wrote:
> +		bests = dp->d_ops->free_bests_p(free);
> +		dp->d_ops->free_hdr_from_disk(&freehdr, free);
>  		if (findex >= 0) {
>  			/* caller already found the freespace for us. */
> -			bests = dp->d_ops->free_bests_p(free);
> -			dp->d_ops->free_hdr_from_disk(&freehdr, free);
> -

I don't see any way how this is needed or helpful with this patch,
we are just going to ovewrite bests and freehdr before even looking
at them if the branch is not taken.

>  			ASSERT(findex < freehdr.nvalid);
>  			ASSERT(be16_to_cpu(bests[findex]) != NULLDATAOFF);
>  			ASSERT(be16_to_cpu(bests[findex]) >= length);
>  			dbno = freehdr.firstdb + findex;
> -			goto out;
> +			goto found_block;

The label rename while more descriptive also seems entirely unrelated.

> +		findex = 0;
> +		free = fbp->b_addr;
>  		bests = dp->d_ops->free_bests_p(free);
>  		dp->d_ops->free_hdr_from_disk(&freehdr, free);
> +
> +		/* Scan the free entry array for a large enough free space. */
> +		do {
> +			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> +			    be16_to_cpu(bests[findex]) >= length) {
> +				dbno = freehdr.firstdb + findex;
> +				goto found_block;
>  			}
> +		} while (++findex < freehdr.nvalid);

Nit: wou;dn't this be better written as a for loop also taking the
initialization of findex into the loop?

Otherwise this looks good.  I always like it when a speedup removes
code..
Christoph Hellwig Aug. 29, 2019, 8:25 a.m. UTC | #2
Actually, another comment:

> +		/* Scan the free entry array for a large enough free space. */
> +		do {
> +			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&

This could be changed to:

			if (bests[findex] != cpu_to_be16(NULLDATAOFF) &&

which might lead to slightly better code generation.
Dave Chinner Aug. 29, 2019, 8:45 a.m. UTC | #3
On Thu, Aug 29, 2019 at 01:18:22AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 29, 2019 at 04:30:41PM +1000, Dave Chinner wrote:
> > +		bests = dp->d_ops->free_bests_p(free);
> > +		dp->d_ops->free_hdr_from_disk(&freehdr, free);
> >  		if (findex >= 0) {
> >  			/* caller already found the freespace for us. */
> > -			bests = dp->d_ops->free_bests_p(free);
> > -			dp->d_ops->free_hdr_from_disk(&freehdr, free);
> > -
> 
> I don't see any way how this is needed or helpful with this patch,
> we are just going to ovewrite bests and freehdr before even looking
> at them if the branch is not taken.

*nod*

The change is not useful anymore as a result of folding in your
previous suggestions. I'll revert it.

> >  			ASSERT(findex < freehdr.nvalid);
> >  			ASSERT(be16_to_cpu(bests[findex]) != NULLDATAOFF);
> >  			ASSERT(be16_to_cpu(bests[findex]) >= length);
> >  			dbno = freehdr.firstdb + findex;
> > -			goto out;
> > +			goto found_block;
> 
> The label rename while more descriptive also seems entirely unrelated.

That was one of your previous suggestions :)

I'll push it back up one patch into the cleanup patch and leave this
as an optimisation only patch.

> > +		findex = 0;
> > +		free = fbp->b_addr;
> >  		bests = dp->d_ops->free_bests_p(free);
> >  		dp->d_ops->free_hdr_from_disk(&freehdr, free);
> > +
> > +		/* Scan the free entry array for a large enough free space. */
> > +		do {
> > +			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> > +			    be16_to_cpu(bests[findex]) >= length) {
> > +				dbno = freehdr.firstdb + findex;
> > +				goto found_block;
> >  			}
> > +		} while (++findex < freehdr.nvalid);
> 
> Nit: wou;dn't this be better written as a for loop also taking the
> initialization of findex into the loop?

Agreed - the next patch does that with the reversal of the search
order. The end result is what you're asking for, so I'll leave this
alone for now....

> Otherwise this looks good.  I always like it when a speedup removes
> code..

I hadn't noticed that - I was more concerned with ending up with
readable code :)

Cheers,

Dave.
Christoph Hellwig Aug. 29, 2019, 8:47 a.m. UTC | #4
On Thu, Aug 29, 2019 at 06:45:30PM +1000, Dave Chinner wrote:
> > 
> > The label rename while more descriptive also seems entirely unrelated.
> 
> That was one of your previous suggestions :)
> 
> I'll push it back up one patch into the cleanup patch and leave this
> as an optimisation only patch.

Oh well.  Just keep it then :)

> > > +		/* Scan the free entry array for a large enough free space. */
> > > +		do {
> > > +			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> > > +			    be16_to_cpu(bests[findex]) >= length) {
> > > +				dbno = freehdr.firstdb + findex;
> > > +				goto found_block;
> > >  			}
> > > +		} while (++findex < freehdr.nvalid);
> > 
> > Nit: wou;dn't this be better written as a for loop also taking the
> > initialization of findex into the loop?
> 
> Agreed - the next patch does that with the reversal of the search
> order. The end result is what you're asking for, so I'll leave this
> alone for now....

If you touch this patch anyway please just switch to the for loop
here, that keeps the churn down in the next one.
Dave Chinner Aug. 29, 2019, 8:55 a.m. UTC | #5
On Thu, Aug 29, 2019 at 01:47:09AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 29, 2019 at 06:45:30PM +1000, Dave Chinner wrote:
> > > 
> > > The label rename while more descriptive also seems entirely unrelated.
> > 
> > That was one of your previous suggestions :)
> > 
> > I'll push it back up one patch into the cleanup patch and leave this
> > as an optimisation only patch.
> 
> Oh well.  Just keep it then :)

It was introduced in the previous patch, anyway :P

> > > > +		/* Scan the free entry array for a large enough free space. */
> > > > +		do {
> > > > +			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> > > > +			    be16_to_cpu(bests[findex]) >= length) {
> > > > +				dbno = freehdr.firstdb + findex;
> > > > +				goto found_block;
> > > >  			}
> > > > +		} while (++findex < freehdr.nvalid);
> > > 
> > > Nit: wou;dn't this be better written as a for loop also taking the
> > > initialization of findex into the loop?
> > 
> > Agreed - the next patch does that with the reversal of the search
> > order. The end result is what you're asking for, so I'll leave this
> > alone for now....
> 
> If you touch this patch anyway please just switch to the for loop
> here, that keeps the churn down in the next one.

OK.
Dave Chinner Aug. 29, 2019, 9:31 a.m. UTC | #6
On Thu, Aug 29, 2019 at 01:25:01AM -0700, Christoph Hellwig wrote:
> Actually, another comment:
> 
> > +		/* Scan the free entry array for a large enough free space. */
> > +		do {
> > +			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> 
> This could be changed to:
> 
> 			if (bests[findex] != cpu_to_be16(NULLDATAOFF) &&
> 
> which might lead to slightly better code generation.

I don't think it will make any difference because the very next
comparison in the if() statement needs the cpu order bests[findex]
value because its a >= check. Hence we have to calculate it anyway,
and the compiler should be smart enough to only evaluate it once...

Cheers,

Dave.
Christoph Hellwig Aug. 29, 2019, 9:33 a.m. UTC | #7
On Thu, Aug 29, 2019 at 07:31:17PM +1000, Dave Chinner wrote:
> On Thu, Aug 29, 2019 at 01:25:01AM -0700, Christoph Hellwig wrote:
> > Actually, another comment:
> > 
> > > +		/* Scan the free entry array for a large enough free space. */
> > > +		do {
> > > +			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> > 
> > This could be changed to:
> > 
> > 			if (bests[findex] != cpu_to_be16(NULLDATAOFF) &&
> > 
> > which might lead to slightly better code generation.
> 
> I don't think it will make any difference because the very next
> comparison in the if() statement needs the cpu order bests[findex]
> value because its a >= check. Hence we have to calculate it anyway,
> and the compiler should be smart enough to only evaluate it once...

Yeah, makes sense.

Patch
diff mbox series

diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 1d3d1c9b5961..c6a1d1cc638f 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1751,8 +1751,8 @@  xfs_dir2_node_find_freeblk(
 	xfs_dir2_db_t		dbno = -1;
 	xfs_dir2_db_t		fbno = -1;
 	xfs_fileoff_t		fo;
-	__be16			*bests;
-	int			findex;
+	__be16			*bests = NULL;
+	int			findex = 0;
 	int			error;
 
 	/*
@@ -1764,16 +1764,15 @@  xfs_dir2_node_find_freeblk(
 		fbp = fblk->bp;
 		free = fbp->b_addr;
 		findex = fblk->index;
+		bests = dp->d_ops->free_bests_p(free);
+		dp->d_ops->free_hdr_from_disk(&freehdr, free);
 		if (findex >= 0) {
 			/* caller already found the freespace for us. */
-			bests = dp->d_ops->free_bests_p(free);
-			dp->d_ops->free_hdr_from_disk(&freehdr, free);
-
 			ASSERT(findex < freehdr.nvalid);
 			ASSERT(be16_to_cpu(bests[findex]) != NULLDATAOFF);
 			ASSERT(be16_to_cpu(bests[findex]) >= length);
 			dbno = freehdr.firstdb + findex;
-			goto out;
+			goto found_block;
 		}
 
 		/*
@@ -1783,8 +1782,6 @@  xfs_dir2_node_find_freeblk(
 		ifbno = fblk->blkno;
 		fbno = ifbno;
 	}
-	ASSERT(dbno == -1);
-	findex = 0;
 
 	/*
 	 * If we don't have a data block yet, we're going to scan the freespace
@@ -1802,69 +1799,45 @@  xfs_dir2_node_find_freeblk(
 
 	/*
 	 * While we haven't identified a data block, search the freeblock
-	 * data for a good data block.  If we find a null freeblock entry,
-	 * indicating a hole in the data blocks, remember that.
+	 * data for a data block with enough free space in it.
 	 */
-	while (dbno == -1) {
-		/*
-		 * If we don't have a freeblock in hand, get the next one.
-		 */
-		if (fbp == NULL) {
-			/*
-			 * If it's ifbno we already looked at it.
-			 */
-			if (++fbno == ifbno)
-				fbno++;
-			/*
-			 * If it's off the end we're done.
-			 */
-			if (fbno >= lastfbno)
-				break;
-			/*
-			 * Read the block.  There can be holes in the
-			 * freespace blocks, so this might not succeed.
-			 * This should be really rare, so there's no reason
-			 * to avoid it.
-			 */
-			error = xfs_dir2_free_try_read(tp, dp,
-					xfs_dir2_db_to_da(args->geo, fbno),
-					&fbp);
-			if (error)
-				return error;
-			if (!fbp)
-				continue;
-			free = fbp->b_addr;
-			findex = 0;
-		}
+	for ( ; fbno < lastfbno; fbno++) {
+		/* If it's ifbno we already looked at it. */
+		if (fbno == ifbno)
+			continue;
+
 		/*
-		 * Look at the current free entry.  Is it good enough?
-		 *
-		 * The bests initialisation should be where the bufer is read in
-		 * the above branch. But gcc is too stupid to realise that bests
-		 * and the freehdr are actually initialised if they are placed
-		 * there, so we have to do it here to avoid warnings. Blech.
+		 * Read the block.  There can be holes in the freespace blocks,
+		 * so this might not succeed.  This should be really rare, so
+		 * there's no reason to avoid it.
 		 */
+		error = xfs_dir2_free_try_read(tp, dp,
+				xfs_dir2_db_to_da(args->geo, fbno),
+				&fbp);
+		if (error)
+			return error;
+		if (!fbp)
+			continue;
+
+		findex = 0;
+		free = fbp->b_addr;
 		bests = dp->d_ops->free_bests_p(free);
 		dp->d_ops->free_hdr_from_disk(&freehdr, free);
-		if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
-		    be16_to_cpu(bests[findex]) >= length)
-			dbno = freehdr.firstdb + findex;
-		else {
-			/*
-			 * Are we done with the freeblock?
-			 */
-			if (++findex == freehdr.nvalid) {
-				/*
-				 * Drop the block.
-				 */
-				xfs_trans_brelse(tp, fbp);
-				fbp = NULL;
-				if (fblk && fblk->bp)
-					fblk->bp = NULL;
+
+		/* Scan the free entry array for a large enough free space. */
+		do {
+			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
+			    be16_to_cpu(bests[findex]) >= length) {
+				dbno = freehdr.firstdb + findex;
+				goto found_block;
 			}
-		}
+		} while (++findex < freehdr.nvalid);
+
+		/* Didn't find free space, go on to next free block */
+		xfs_trans_brelse(tp, fbp);
 	}
-out:
+
+found_block:
 	*dbnop = dbno;
 	*fbpp = fbp;
 	*findexp = findex;