diff mbox series

[v3,16/16] midx.c: improve cache locality in midx_pack_order_cmp()

Message ID 550e785f10ba14f166958501c007b75a04052a0d.1615482270.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx: implement a multi-pack reverse index | expand

Commit Message

Taylor Blau March 11, 2021, 5:05 p.m. UTC
From: Jeff King <peff@peff.net>

There is a lot of pointer dereferencing in the pre-image version of
'midx_pack_order_cmp()', which this patch gets rid of.

Instead of comparing the pack preferred-ness and then the pack id, both
of these checks are done at the same time by using the high-order bit of
the pack id to represent whether it's preferred. Then the pack id and
offset are compared as usual.

This produces the same result so long as there are less than 2^31 packs,
which seems like a likely assumption to make in practice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 55 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

Comments

Jeff King March 29, 2021, 12:59 p.m. UTC | #1
On Thu, Mar 11, 2021 at 12:05:42PM -0500, Taylor Blau wrote:

> From: Jeff King <peff@peff.net>
> 
> There is a lot of pointer dereferencing in the pre-image version of
> 'midx_pack_order_cmp()', which this patch gets rid of.
> 
> Instead of comparing the pack preferred-ness and then the pack id, both
> of these checks are done at the same time by using the high-order bit of
> the pack id to represent whether it's preferred. Then the pack id and
> offset are compared as usual.
> 
> This produces the same result so long as there are less than 2^31 packs,
> which seems like a likely assumption to make in practice.

Obviously this patch is brilliant. ;)

Did we record any numbers to show the improvement here? I don't think it
can be demonstrated with this series (since most of the code is dead),
but I recall that this was motivated by a noticeable slowdown.

I briefly wondered whether the complicated midx_pack_order_cmp() in
pack-revindex.c, which is used for the bsearch() there, could benefit
from the same speedup. It's only log(n), of course, instead of n*log(n),
but one might imagine making "n" calls to it. I don't think it makes
sense, though. The pointer dereferencing there is into the midx mmap
itself. Creating an auxiliary array would defeat the purpose.

-Peff
Taylor Blau March 29, 2021, 9:34 p.m. UTC | #2
On Mon, Mar 29, 2021 at 08:59:12AM -0400, Jeff King wrote:
> On Thu, Mar 11, 2021 at 12:05:42PM -0500, Taylor Blau wrote:
>
> > From: Jeff King <peff@peff.net>
> >
> > There is a lot of pointer dereferencing in the pre-image version of
> > 'midx_pack_order_cmp()', which this patch gets rid of.
> >
> > Instead of comparing the pack preferred-ness and then the pack id, both
> > of these checks are done at the same time by using the high-order bit of
> > the pack id to represent whether it's preferred. Then the pack id and
> > offset are compared as usual.
> >
> > This produces the same result so long as there are less than 2^31 packs,
> > which seems like a likely assumption to make in practice.
>
> Obviously this patch is brilliant. ;)

Obviously.

> Did we record any numbers to show the improvement here? I don't think it
> can be demonstrated with this series (since most of the code is dead),
> but I recall that this was motivated by a noticeable slowdown.

Looking through our messages, you wrote that this seemed to produce a
.8 second speed-up on a large-ish repository that we were testing.
That's not significant overall, the fact that we were spending so long
probably caught our attention when looking at a profiler.

I could go either way on mentioning it. It does feel a little like
cheating to say, "well, if you applied these other patches it would make
it about this much faster". So I'm mostly happy to just keep it vague
and say that it makes things a little faster, unless you feel strongly
otherwise.

> I briefly wondered whether the complicated midx_pack_order_cmp() in
> pack-revindex.c, which is used for the bsearch() there, could benefit
> from the same speedup. It's only log(n), of course, instead of n*log(n),
> but one might imagine making "n" calls to it. I don't think it makes
> sense, though. The pointer dereferencing there is into the midx mmap
> itself. Creating an auxiliary array would defeat the purpose.

Right.

> -Peff

Thanks,
Taylor
Jeff King March 30, 2021, 7:15 a.m. UTC | #3
On Mon, Mar 29, 2021 at 05:34:21PM -0400, Taylor Blau wrote:

> > Did we record any numbers to show the improvement here? I don't think it
> > can be demonstrated with this series (since most of the code is dead),
> > but I recall that this was motivated by a noticeable slowdown.
> 
> Looking through our messages, you wrote that this seemed to produce a
> .8 second speed-up on a large-ish repository that we were testing.
> That's not significant overall, the fact that we were spending so long
> probably caught our attention when looking at a profiler.

That sounds about right from my recollection.

> I could go either way on mentioning it. It does feel a little like
> cheating to say, "well, if you applied these other patches it would make
> it about this much faster". So I'm mostly happy to just keep it vague
> and say that it makes things a little faster, unless you feel strongly
> otherwise.

No, I don't feel strongly. I just wanted to give people reading a sense
of what to expect. Now we have.

-Peff
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index eea9574d92..4835cc13d1 100644
--- a/midx.c
+++ b/midx.c
@@ -828,46 +828,49 @@  static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
-static int midx_pack_order_cmp(const void *va, const void *vb, void *_ctx)
+struct midx_pack_order_data {
+	uint32_t nr;
+	uint32_t pack;
+	off_t offset;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
 {
-	struct write_midx_context *ctx = _ctx;
-
-	struct pack_midx_entry *a = &ctx->entries[*(const uint32_t *)va];
-	struct pack_midx_entry *b = &ctx->entries[*(const uint32_t *)vb];
-
-	uint32_t perm_a = ctx->pack_perm[a->pack_int_id];
-	uint32_t perm_b = ctx->pack_perm[b->pack_int_id];
-
-	/* Sort objects in the preferred pack ahead of any others. */
-	if (a->preferred > b->preferred)
+	const struct midx_pack_order_data *a = va, *b = vb;
+	if (a->pack < b->pack)
 		return -1;
-	if (a->preferred < b->preferred)
+	else if (a->pack > b->pack)
 		return 1;
-
-	/* Then, order objects by which packs they appear in. */
-	if (perm_a < perm_b)
+	else if (a->offset < b->offset)
 		return -1;
-	if (perm_a > perm_b)
+	else if (a->offset > b->offset)
 		return 1;
-
-	/* Then, disambiguate by their offset within each pack. */
-	if (a->offset < b->offset)
-		return -1;
-	if (a->offset > b->offset)
-		return 1;
-
-	return 0;
+	else
+		return 0;
 }
 
 static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 {
+	struct midx_pack_order_data *data;
 	uint32_t *pack_order;
 	uint32_t i;
 
+	ALLOC_ARRAY(data, ctx->entries_nr);
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *e = &ctx->entries[i];
+		data[i].nr = i;
+		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		if (!e->preferred)
+			data[i].pack |= (1U << 31);
+		data[i].offset = e->offset;
+	}
+
+	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
+
 	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	for (i = 0; i < ctx->entries_nr; i++)
-		pack_order[i] = i;
-	QSORT_S(pack_order, ctx->entries_nr, midx_pack_order_cmp, ctx);
+		pack_order[i] = data[i].nr;
+	free(data);
 
 	return pack_order;
 }