diff mbox series

[v3,09/10] pack-revindex: ensure that on-disk reverse indexes are given precedence

Message ID a66d2f9f7c20eeb813656e66b3ad9d42f2eecf34.1611617820.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series pack-revindex: introduce on-disk '.rev' format | expand

Commit Message

Taylor Blau Jan. 25, 2021, 11:37 p.m. UTC
When an on-disk reverse index exists, there is no need to generate one
in memory. In fact, doing so can be slow, and require large amounts of
the heap.

Let's make sure that we treat the on-disk reverse index with precedence
(i.e., that when it exists, we don't bother trying to generate an
equivalent one in memory) by teaching Git how to conditionally die()
when generating a reverse index in memory.

Then, add a test to ensure that when (a) an on-disk reverse index
exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we
do not die, implying that we read from the on-disk one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-revindex.c          | 4 ++++
 pack-revindex.h          | 1 +
 t/t5325-reverse-index.sh | 9 +++++++++
 3 files changed, 14 insertions(+)

Comments

Jeff King Jan. 29, 2021, 12:53 a.m. UTC | #1
On Mon, Jan 25, 2021 at 06:37:46PM -0500, Taylor Blau wrote:

> When an on-disk reverse index exists, there is no need to generate one
> in memory. In fact, doing so can be slow, and require large amounts of
> the heap.
> 
> Let's make sure that we treat the on-disk reverse index with precedence
> (i.e., that when it exists, we don't bother trying to generate an
> equivalent one in memory) by teaching Git how to conditionally die()
> when generating a reverse index in memory.
> 
> Then, add a test to ensure that when (a) an on-disk reverse index
> exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we
> do not die, implying that we read from the on-disk one.

I don't love this kind of hackery, as it will have to live forever if we
want to keep testing this feature. But I also don't know how else to
tell in the regular test suite that we are avoiding the slow path.

Would it be enough to instead add a t/perf test which checks the speed
of:

  echo HEAD | git cat-file --batch-check='%(objectsize:disk)'

? I hate relying on the perf suite, because it gets run way less often
(and requires a lot of squinting to interpret the results). But it
wouldn't require any extra code the binary, as it's observing the actual
user-visible thing we care about: speed.

(I guess we care as much or more about peak heap usage, but measuring
that is a whole other can of worms).

-Peff
Taylor Blau Jan. 29, 2021, 1:25 a.m. UTC | #2
On Thu, Jan 28, 2021 at 07:53:34PM -0500, Jeff King wrote:
> On Mon, Jan 25, 2021 at 06:37:46PM -0500, Taylor Blau wrote:
>
> > When an on-disk reverse index exists, there is no need to generate one
> > in memory. In fact, doing so can be slow, and require large amounts of
> > the heap.
> >
> > Let's make sure that we treat the on-disk reverse index with precedence
> > (i.e., that when it exists, we don't bother trying to generate an
> > equivalent one in memory) by teaching Git how to conditionally die()
> > when generating a reverse index in memory.
> >
> > Then, add a test to ensure that when (a) an on-disk reverse index
> > exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we
> > do not die, implying that we read from the on-disk one.
>
> I don't love this kind of hackery, as it will have to live forever if we
> want to keep testing this feature. But I also don't know how else to
> tell in the regular test suite that we are avoiding the slow path.
>
> Would it be enough to instead add a t/perf test which checks the speed
> of:
>
>   echo HEAD | git cat-file --batch-check='%(objectsize:disk)'
>
> ? I hate relying on the perf suite, because it gets run way less often
> (and requires a lot of squinting to interpret the results). But it
> wouldn't require any extra code the binary, as it's observing the actual
> user-visible thing we care about: speed.
>
> (I guess we care as much or more about peak heap usage, but measuring
> that is a whole other can of worms).

Yeah, I think that the thing we care most about is peak RSS, but I agree
that I don't really want to wade into figuring out how to measure that
reliably during test time (I think there is a separate and less relevant
argument about whether that is something that we should be testing or
not).

But, getting back to your original comment, I think that I actually
prefer to have the GIT_TEST_XYZ_DIE stuff in the binary than I do
relying on the perf suite to catch stuff like this.

I understand your concern about the binary size. I guess you could
#ifdef this out and only build it in during the test suite, but then
you're testing a different binary, and so that calls into question the
integrity of the test suite itself, etc. etc.

So, I guess that's all to say that I while I do find this to be hack-y
and gross, I don't think that it's all that bad when you compare it to
the alternatives.

As usual, I'm happy to change it if you feel strongly that it should be
changed.

> -Peff

Thanks,
Taylor
Jeff King Jan. 30, 2021, 8:46 a.m. UTC | #3
On Thu, Jan 28, 2021 at 08:25:30PM -0500, Taylor Blau wrote:

> But, getting back to your original comment, I think that I actually
> prefer to have the GIT_TEST_XYZ_DIE stuff in the binary than I do
> relying on the perf suite to catch stuff like this.
> 
> I understand your concern about the binary size. I guess you could
> #ifdef this out and only build it in during the test suite, but then
> you're testing a different binary, and so that calls into question the
> integrity of the test suite itself, etc. etc.

I care less about binary size, and more just that the production binary
people use day-to-day has a bunch of weird test-only behaviors baked
into it. I guess nobody is likely to trigger this by accident, though,
and anybody who can maliciously impact the environment can do a lot
worse.  So it's not the end of the world to keep. It just offends my
sense of cleanliness and simplicity. ;)

> So, I guess that's all to say that I while I do find this to be hack-y
> and gross, I don't think that it's all that bad when you compare it to
> the alternatives.
> 
> As usual, I'm happy to change it if you feel strongly that it should be
> changed.

I agree the alternatives are pretty lousy. I don't feel strongly enough
to complain further. :)

-Peff
diff mbox series

Patch

diff --git a/pack-revindex.c b/pack-revindex.c
index a174fa5388..83fe4de773 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -2,6 +2,7 @@ 
 #include "pack-revindex.h"
 #include "object-store.h"
 #include "packfile.h"
+#include "config.h"
 
 struct revindex_entry {
 	off_t offset;
@@ -166,6 +167,9 @@  static void create_pack_revindex(struct packed_git *p)
 
 static int create_pack_revindex_in_memory(struct packed_git *p)
 {
+	if (git_env_bool(GIT_TEST_REV_INDEX_DIE_IN_MEMORY, 0))
+		die("dying as requested by '%s'",
+		    GIT_TEST_REV_INDEX_DIE_IN_MEMORY);
 	if (open_pack_index(p))
 		return -1;
 	create_pack_revindex(p);
diff --git a/pack-revindex.h b/pack-revindex.h
index d1a0595e89..ba7c82c125 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -21,6 +21,7 @@ 
 #define RIDX_VERSION 1
 
 #define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX"
+#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
 
 struct packed_git;
 
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index be452bb343..a344b18d7e 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -85,4 +85,13 @@  test_expect_success 'pack-objects respects pack.writeReverseIndex' '
 	test_path_is_file pack-1-*.rev
 '
 
+test_expect_success 'reverse index is not generated when available on disk' '
+	test_index_pack true &&
+	test_path_is_file $rev &&
+
+	git rev-parse HEAD >tip &&
+	GIT_TEST_REV_INDEX_DIE_IN_MEMORY=1 git cat-file \
+		--batch-check="%(objectsize:disk)" <tip
+'
+
 test_done