diff mbox series

[4/4] t5326: test propagating hashcache values

Message ID acf3ec60cb6f151a9f121d242f38fef6711cced4.1631049462.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: permute existing namehash values | expand

Commit Message

Taylor Blau Sept. 7, 2021, 9:18 p.m. UTC
Now that we both can propagate values from the hashcache, and respect
the configuration to enable the hashcache at all, test that both of
these function correctly by hardening their behavior with a test.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Ævar Arnfjörð Bjarmason Sept. 8, 2021, 1:46 a.m. UTC | #1
On Tue, Sep 07 2021, Taylor Blau wrote:

> Now that we both can propagate values from the hashcache, and respect
> the configuration to enable the hashcache at all, test that both of
> these function correctly by hardening their behavior with a test.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 4ad7c2c969..24148ca35b 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
>  	)
>  '
>  
> +test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> +	rm -fr repo &&
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&

It seems the need for this "rm -fr repo" dance instead of just relying
on test_when_finished "rm -rf repo" is because of a 3x tests in a
function in tb/multi-pack-bitmaps that should probably be combined into
one, i.e. they share the same logical "repo" setup.

> +	(
> +		cd repo &&
> +
> +		git config pack.writeBitmapHashCache true &&

s/git config/test_config/, surely.
Taylor Blau Sept. 8, 2021, 2:30 a.m. UTC | #2
On Wed, Sep 08, 2021 at 03:46:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > Now that we both can propagate values from the hashcache, and respect
> > the configuration to enable the hashcache at all, test that both of
> > these function correctly by hardening their behavior with a test.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> > index 4ad7c2c969..24148ca35b 100755
> > --- a/t/t5326-multi-pack-bitmaps.sh
> > +++ b/t/t5326-multi-pack-bitmaps.sh
> > @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
> >  	)
> >  '
> >
> > +test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> > +	rm -fr repo &&
> > +	git init repo &&
> > +	test_when_finished "rm -fr repo" &&
>
> It seems the need for this "rm -fr repo" dance instead of just relying
> on test_when_finished "rm -rf repo" is because of a 3x tests in a
> function in tb/multi-pack-bitmaps that should probably be combined into
> one, i.e. they share the same logical "repo" setup.

Yeah. There's definitely room for clean-up there if we want to have each
of the tests operate on the same repo. I have always found sharing a
repo between tests difficult to reason about, since we have to assume
that any --run parameters could be passed in.

So I have tended to err on the side of creating and throwing away a new
repo per test, but I understand that's somewhat atypical for Git's
tests.

> > +	(
> > +		cd repo &&
> > +
> > +		git config pack.writeBitmapHashCache true &&
>
> s/git config/test_config/, surely.

No, since this is in a subshell (and we don't care about unsetting the
value of a config in a repo that we're going to throw away, anyways).

(Side-note: since this has a /bin/sh shebang, we can't detect that we're
in a subshell and so test_config actually _would_ work here. But
switching this test to run with /bin/bash where we can detect whether or
not we're in a subshell would cause this test to fail with test_config).

Thanks,
Taylor
Junio C Hamano Sept. 13, 2021, 12:46 a.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> Now that we both can propagate values from the hashcache, and respect
> the configuration to enable the hashcache at all, test that both of
> these function correctly by hardening their behavior with a test.

When we serve "git fetch" requests from a resulting repository,
because we have namehash available without traversing, we can serve
a packfile to the requestor, using the reachability bitmap to pick
objects that need to be sent, grouping the blobs with the same
namehash to be deltified against each other, resulting in a pack
that is better compressed than before?

What I am wondering is if we can come up with a "now, because we no
longer lose hashcache, we can do X so much better, here are the
numbers".
Taylor Blau Sept. 14, 2021, 1:12 a.m. UTC | #4
On Sun, Sep 12, 2021 at 05:46:18PM -0700, Junio C Hamano wrote:
> What I am wondering is if we can come up with a "now, because we no
> longer lose hashcache, we can do X so much better, here are the
> numbers".

Sure, here they are. Bear in mind that these numbers are (a) on git.git
and (b) with `pack.threads` set to 1 so the overall runtime looks more
like CPU time.

  Test                            origin/tb/multi-pack-bitmaps   HEAD
  -------------------------------------------------------------------------------------
  5326.4: simulated clone         1.87(1.80+0.07)                1.46(1.42+0.03) -21.9%
  5326.5: simulated fetch         2.66(2.61+0.04)                1.47(1.43+0.04) -44.7%
  5326.6: pack to file (bitmap)   2.74(2.62+0.12)                1.89(1.82+0.07) -31.0%

Apologies for taking a little while to respond, I spent longer than I'm
willing to admit double checking these numbers with Peff because of
inconsistencies in my testing setup.

Alas, there they are. They are basically no different than having the
name-hash for single pack bitmaps, it's just now we don't throw them
away when generating a MIDX bitmap from a state where the repository
already has a single-pack bitmap.

Thanks,
Taylor
Junio C Hamano Sept. 14, 2021, 2:05 a.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

> Alas, there they are. They are basically no different than having the
> name-hash for single pack bitmaps, it's just now we don't throw them
> away when generating a MIDX bitmap from a state where the repository
> already has a single-pack bitmap.

I actually wasn't expecting any CPU/time difference.

I hope that we are talking about the same name-hash, which is used
to sort the blobs so that when pack-objects try to find a good delta
base, the blobs from the same path will sit close to each other and
hopefully fit in the pack window.

The effect I was hoping to see by not discarding the information was
that we find better delta base hence smaller deltas in the resulting
packfiles.

Thanks.
Taylor Blau Sept. 14, 2021, 5:11 a.m. UTC | #6
On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Alas, there they are. They are basically no different than having the
> > name-hash for single pack bitmaps, it's just now we don't throw them
> > away when generating a MIDX bitmap from a state where the repository
> > already has a single-pack bitmap.
>
> I actually wasn't expecting any CPU/time difference.

I think it is possible to see the CPU usage go down without affecting the
resulting pack size. See below for a more detailed analysis.

> I hope that we are talking about the same name-hash, which is used
> to sort the blobs so that when pack-objects try to find a good delta
> base, the blobs from the same path will sit close to each other and
> hopefully fit in the pack window.

Yes, of course.

> The effect I was hoping to see by not discarding the information was
> that we find better delta base hence smaller deltas in the resulting
> packfiles.

I think it is possible to observe either a decrease in CPU or a decrease
in the resulting pack size.

In my experience having the name-hash filled in results in finding good
delta pairs much more quickly than without, but that in many
repositories the resulting pack size is basically the same. In other
words, the resulting pack is pretty similar whether you use the
name-hash or not, it just affects how quickly you get there.

Some experiments to back that up: I instrumented the existing p5326 by
replacing anything like "pack-objects ... --stdout >/dev/null" with
"pack-objects ... --stdout >pack.tmp" and then added test_size's to
measure the size of each pack.

On the tip of this branch, the results are:

		Test                              origin/tb/multi-pack-bitmaps   HEAD
		----------------------------------------------------------------------------
		5326.5: simulated clone size                 3.3G                 3.3G +0.0%
		5326.7: simulated fetch size                10.5M                10.5M -0.2%
		5326.21: clone (partial bitmap)              3.3G                 3.3G +0.0%

Looking at c171d3e677 (pack-bitmap: implement optional name_hash cache,
2013-10-22), I modified[1] that script to replace timing pack-objects with
counting the number of bytes it wrote.

Doing that shows that the name-hash doesn't make a substantial difference in
the resulting pack size (numbers on a recent-ish copy of the kernel):

		Test                      c171d3e677d777c50231d8dea32ae691936da819^   c171d3e677d777c50231d8dea32ae691936da819
		--------------------------------------------------------------------------------------------------------------
		9999.3: simulated clone              3.2G                                        3.2G +0.0%
		9999.4: simulated fetch                32                                          32 +0.0%
		9999.6: partial bitmap               3.1G                                        3.1G +0.0%

(As a mostly-unrelated aside, I was curious why the pack size jumped from 3.2GB
to 3.3GB, but I can reproduce that jump even in p5310--the single pack bitmap
test--on the tip of my branch. So it does appear to be a regression which I'll
look into, but it's unrelated to this branch or MIDX bitmaps).

Thanks,
Taylor

[1]: https://gist.github.com/ttaylorr/6cfa3eb9fd012f81b833873d50f96f71
Taylor Blau Sept. 14, 2021, 5:17 a.m. UTC | #7
On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote:
> On the tip of this branch, the results are:
>
> [...]

Eek, those tabs are horrific. I must have left my editor in paste mode
when inserting them, sorry about that.

Thanks,
Taylor
Jeff King Sept. 14, 2021, 5:23 a.m. UTC | #8
On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> > Alas, there they are. They are basically no different than having the
> > name-hash for single pack bitmaps, it's just now we don't throw them
> > away when generating a MIDX bitmap from a state where the repository
> > already has a single-pack bitmap.
> 
> I actually wasn't expecting any CPU/time difference.

I was, for the same reason we saw an improvement there in ae4f07fbcc
(pack-bitmap: implement optional name_hash cache, 2013-12-21): without a
name-hash, we try a bunch of fruitless deltas before we find a decent
one.

> I hope that we are talking about the same name-hash, which is used
> to sort the blobs so that when pack-objects try to find a good delta
> base, the blobs from the same path will sit close to each other and
> hopefully fit in the pack window.

Yes, exactly. We spend less time finding the good ones if the likely
candidates are close together. We may _also_ find better ones overall,
depending on the number of candidates and the window size.

The bitmap perf tests (neither p5310 nor its new midx cousin p5326)
don't check the output size.

> The effect I was hoping to see by not discarding the information was
> that we find better delta base hence smaller deltas in the resulting
> packfiles.

If we add a size check like so[1]:

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
index 63d3bc7cec..648cd5b13d 100644
--- a/t/perf/lib-bitmap.sh
+++ b/t/perf/lib-bitmap.sh
@@ -10,7 +10,11 @@ test_full_bitmap () {
 		{
 			echo HEAD &&
 			echo ^$have
-		} | git pack-objects --revs --stdout >/dev/null
+		} | git pack-objects --revs --stdout >tmp.pack
+	'
+
+	test_size 'fetch size' '
+		wc -c <tmp.pack
 	'
 
 	test_perf 'pack to file (bitmap)' '

then the results I get using linux.git are:

Test                       origin/tb/multi-pack-bitmaps   origin/tb/midx-write-propagate-namehash
-------------------------------------------------------------------------------------------------
5326.4: simulated fetch    2.32(7.16+0.21)                2.00(3.79+0.18) -13.8%
5326.5: fetch size         16.7M                          15.5M -7.1%

so you can see that we spent about half as much CPU (ignore the
wall-clock percentage; the interesting thing is the userspace time,
because my machine has 8 cores). But we also shaved off a bit from the
pack, so we really did manage to find better deltas, too.

I see that Taylor just posted a very similar response, and independently
did the exact same experiment I did. ;) I'll send this anyway, though,
as my particular run showed slightly different results.

-Peff

[1] The other thing you'd want (and I presume Taylor was using for his
    earlier timings) is:

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index 5845109ac7..a4ac7746a7 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -11,7 +11,7 @@ test_expect_success 'enable multi-pack index' '
 '
 
 test_perf 'setup multi-pack index' '
-	git repack -ad &&
+	git repack -adb &&
 	git multi-pack-index write --bitmap
 '
 

since otherwise there is no pack bitmap for the midx to pull the
name-hashes from.
Jeff King Sept. 14, 2021, 5:27 a.m. UTC | #9
On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote:

> Some experiments to back that up: I instrumented the existing p5326 by
> replacing anything like "pack-objects ... --stdout >/dev/null" with
> "pack-objects ... --stdout >pack.tmp" and then added test_size's to
> measure the size of each pack.
> 
> On the tip of this branch, the results are:
> 
> 		Test                              origin/tb/multi-pack-bitmaps   HEAD
> 		----------------------------------------------------------------------------
> 		5326.5: simulated clone size                 3.3G                 3.3G +0.0%
> 		5326.7: simulated fetch size                10.5M                10.5M -0.2%
> 		5326.21: clone (partial bitmap)              3.3G                 3.3G +0.0%

I wouldn't expect a change in the clone size. We're already including
all the objects from the single pack, so we won't even look for new
deltas.

In my run, I did see a small improvement in the fetch size (though my
size both before and after was larger than yours). This is going to
depend on the exact set of deltas we have (which in turn depends on how
your repo happens to have been packed before the script even starts) and
which ones the client actually wants (which may depend on the exact tip
of your repo).

Presumably you also saw a decrease in the user CPU time of 5326.6 here.
If not, you may have forgotten the extra patch to create the pack
bitmap.

-Peff
Taylor Blau Sept. 14, 2021, 5:31 a.m. UTC | #10
On Tue, Sep 14, 2021 at 01:27:31AM -0400, Jeff King wrote:
> On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote:
>
> > Some experiments to back that up: I instrumented the existing p5326 by
> > replacing anything like "pack-objects ... --stdout >/dev/null" with
> > "pack-objects ... --stdout >pack.tmp" and then added test_size's to
> > measure the size of each pack.
> >
> > On the tip of this branch, the results are:
> >
> > 		Test                              origin/tb/multi-pack-bitmaps   HEAD
> > 		----------------------------------------------------------------------------
> > 		5326.5: simulated clone size                 3.3G                 3.3G +0.0%
> > 		5326.7: simulated fetch size                10.5M                10.5M -0.2%
> > 		5326.21: clone (partial bitmap)              3.3G                 3.3G +0.0%
>
> I wouldn't expect a change in the clone size. We're already including
> all the objects from the single pack, so we won't even look for new
> deltas.
>
> In my run, I did see a small improvement in the fetch size (though my
> size both before and after was larger than yours). This is going to
> depend on the exact set of deltas we have (which in turn depends on how
> your repo happens to have been packed before the script even starts) and
> which ones the client actually wants (which may depend on the exact tip
> of your repo).

Yes, I agree with all of that. I am still interested in trying to figure
out why the resulting clone size seems to go up (independent of the
changes here). I'm bisecting it, but it's slow, since every step
requires you to repack the kernel.

> Presumably you also saw a decrease in the user CPU time of 5326.6 here.
> If not, you may have forgotten the extra patch to create the pack
> bitmap.

I did, but didn't bother to include the timings in the quoted part,
since I already shared them in [1].

I have a handful of new patches for an updated version of this series
which explains the extra patch you are talking about, too.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/YT%2F3BuDa7KfUN%2F38@nand.local/
Junio C Hamano Sept. 14, 2021, 5:49 a.m. UTC | #11
Jeff King <peff@peff.net> writes:

> On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote:
>
>> Taylor Blau <me@ttaylorr.com> writes:
>> 
>> > Alas, there they are. They are basically no different than having the
>> > name-hash for single pack bitmaps, it's just now we don't throw them
>> > away when generating a MIDX bitmap from a state where the repository
>> > already has a single-pack bitmap.
>> 
>> I actually wasn't expecting any CPU/time difference.
>
> I was, for the same reason we saw an improvement there in ae4f07fbcc
> (pack-bitmap: implement optional name_hash cache, 2013-12-21): without a
> name-hash, we try a bunch of fruitless deltas before we find a decent
> one.

Nice.  We learn new things every once in a while ;-)

>> I hope that we are talking about the same name-hash, which is used
>> to sort the blobs so that when pack-objects try to find a good delta
>> base, the blobs from the same path will sit close to each other and
>> hopefully fit in the pack window.
>
> Yes, exactly. We spend less time finding the good ones if the likely
> candidates are close together. We may _also_ find better ones overall,
> depending on the number of candidates and the window size.

It is a pleasant surprise that in a real history like linux.git we
can even find good delta base without the name hash (unless we are
using insanely wide window size, that is).  The objects in such a
case will be sorted solely by size, larger to smaller, and we need
to find a good delta base within that window.  It may not be as
horrible as fast-import (which only tries to delta against the
previous single object), but with ~70k paths in a single revision
with history that is ~1m deep, I was pessimistic to see a size-only
sort to drop even a pair of blobs from the same path within the
default window size of 10.  But it seems that such a pessimism is
unwarranted, which is a good news ;-).
Ævar Arnfjörð Bjarmason Sept. 17, 2021, 8:56 a.m. UTC | #12
On Tue, Sep 07 2021, Taylor Blau wrote:

> On Wed, Sep 08, 2021 at 03:46:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Tue, Sep 07 2021, Taylor Blau wrote:
>>
>> > Now that we both can propagate values from the hashcache, and respect
>> > the configuration to enable the hashcache at all, test that both of
>> > these function correctly by hardening their behavior with a test.
>> >
>> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
>> > ---
>> >  t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
>> >  1 file changed, 32 insertions(+)
>> >
>> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
>> > index 4ad7c2c969..24148ca35b 100755
>> > --- a/t/t5326-multi-pack-bitmaps.sh
>> > +++ b/t/t5326-multi-pack-bitmaps.sh
>> > @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
>> >  	)
>> >  '
>> >
>> > +test_expect_success 'hash-cache values are propagated from pack bitmaps' '
>> > +	rm -fr repo &&
>> > +	git init repo &&
>> > +	test_when_finished "rm -fr repo" &&
>>
>> It seems the need for this "rm -fr repo" dance instead of just relying
>> on test_when_finished "rm -rf repo" is because of a 3x tests in a
>> function in tb/multi-pack-bitmaps that should probably be combined into
>> one, i.e. they share the same logical "repo" setup.
>
> Yeah. There's definitely room for clean-up there if we want to have each
> of the tests operate on the same repo. I have always found sharing a
> repo between tests difficult to reason about, since we have to assume
> that any --run parameters could be passed in.
>
> So I have tended to err on the side of creating and throwing away a new
> repo per test, but I understand that's somewhat atypical for Git's
> tests.

Just in an effort to make myself clear here, because between your note
in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and
re-reading my mail here I can barely understand what I meant here :)

What I mean is that the whole "rm -rf" dance at the start of tests is
fallout from an earlier call you made in c51f5a6437c (t5326: test
multi-pack bitmap behavior, 2021-08-31) to split up three tests that
really are the same logical test in terms of setup/teardown.

If they were to be re-combined as shown in the diff-on-top at the end
here none of the tests need a "rm -rf repo" at the beginning, because
they can all rely on a starting pattern of:

    test_when_finished "rm -rf repo" &&
    git init repo &&
    ( cd repo ... )

Or, you can also just not re-use the "repo" name, which is what I did in
the fsck tests at
https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/;
then you don't even need test_when_finished "rm -rf[ ...]".

None of this requires fixing here or a re-roll, unless you happen to
think this diff is the best thing since sliced bread (assume my
Signed-off-by).

But just as a general musing on patterns to use in tests, I see why you
used that 1x test as 3x, because you want "test_expect_success" to give
you the label on for each "step".

I think that would be worth fixing in test-lib.sh, there's no inherent
reason we couldn't support "checkpoints" or "subtests" within
"test_expect_success" (the latter being a part of the TAP protocol).

But until then IMNSHO this sort of thing is an anti-pattern,
i.e. needing to write things like:

    test_expect_success '[...]' 'A'
    test_expect_success '[...]' 'B'

Where a part of what B needs to do is to clean up the mess left behind
A, or relies on its setup.

That's just added verbosity and context when reading the code. Ideally
you'd just need to read the "setup" at the start, and then individual
tests, with this pattern you need to read the preceding test(s) to see
where the crap they're cleaning came from, and if it's even needed etc.

(For the "setup" part I've suggested a test_expect_setup, see
https://lore.kernel.org/git/8735vrvg39.fsf@evledraar.gmail.com/).

The "needs the setup" part of this has the negative side-effect of
breaking the semantics of the --run option. As noted in the E-Mail
linked in the last paragraph it doesn't work well in general because of
things like this, but let's steer in the right direction.

I.e. with this change you can run:

    ./t5326-multi-pack-bitmaps.sh --run=111-113

Which is great, usually you need at least 1,<nums you want> ,r
1-10,<nums you want> to get the setup. But because you split them this
breaks:

    ./t5326-multi-pack-bitmaps.sh --run=112-113

I.e. we'll run only the 2nd and 3rd test in a sequence that needs the
1st for setup.

For context: My preference for this isn't just an asthetic or
theoretical thing. It's really nice to be hacking some code, find that
I've broken the bitmap code somehow in your test #112, and be able to
just run (since running the whole thing takes 13s on my box, and I'd
like to test in a tight loop):

    ./t5326-multi-pack-bitmaps.sh --run=112

But that breaks as noted above, so I need read earlier tests (I was
probably looking at test #112 already at this point) and see how their
setup works etc.

Anyway, as with so many things in git.git being able to just assume
--run works like this isn't something you can rely on in the general
case, but just as a matter of where we should be headed...

>> > +	(
>> > +		cd repo &&
>> > +
>> > +		git config pack.writeBitmapHashCache true &&
>>
>> s/git config/test_config/, surely.
>
> No, since this is in a subshell (and we don't care about unsetting the
> value of a config in a repo that we're going to throw away, anyways).
>
> (Side-note: since this has a /bin/sh shebang, we can't detect that we're
> in a subshell and so test_config actually _would_ work here. But
> switching this test to run with /bin/bash where we can detect whether or
> not we're in a subshell would cause this test to fail with test_config).

Yes, you're 100% right here. This was purely a misreading on my part, I
managed to somehow not see/take into account the subshell. Using
test_config makes no sense here.

The diff-on-top for discussion mentioned above:

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 24148ca35b3..a6bb0abb387 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -133,8 +133,8 @@ bitmap_reuse_tests() {
 	from=$1
 	to=$2
 
-	test_expect_success "setup pack reuse tests ($from -> $to)" '
-		rm -fr repo &&
+	test_expect_success "setup/build/verify ($from -> $to) pack reuse bitmaps" '
+		test_when_finished "rm -rf repo" &&
 		git init repo &&
 		(
 			cd repo &&
@@ -148,13 +148,9 @@ bitmap_reuse_tests() {
 				git multi-pack-index write --bitmap
 			else
 				git repack -Adb
-			fi
-		)
-	'
+			fi &&
 
-	test_expect_success "build bitmap from existing ($from -> $to)" '
-		(
-			cd repo &&
+			# Build
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&
 
@@ -164,13 +160,9 @@ bitmap_reuse_tests() {
 				git multi-pack-index write --bitmap
 			else
 				git repack -Adb
-			fi
-		)
-	'
+			fi &&
 
-	test_expect_success "verify resulting bitmaps ($from -> $to)" '
-		(
-			cd repo &&
+			# Verify
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -183,9 +175,8 @@ bitmap_reuse_tests 'MIDX' 'pack'
 bitmap_reuse_tests 'MIDX' 'MIDX'
 
 test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
 
@@ -217,9 +208,8 @@ test_expect_success 'setup partial bitmaps' '
 basic_bitmap_tests HEAD~
 
 test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
 		test_commit base &&
@@ -245,8 +235,8 @@ test_expect_success 'removing a MIDX clears stale bitmaps' '
 '
 
 test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
 
@@ -284,9 +274,8 @@ test_expect_success 'pack.preferBitmapTips' '
 '
 
 test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
Taylor Blau Sept. 17, 2021, 5:32 p.m. UTC | #13
On Fri, Sep 17, 2021 at 10:56:09AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> It seems the need for this "rm -fr repo" dance instead of just relying
> >> on test_when_finished "rm -rf repo" is because of a 3x tests in a
> >> function in tb/multi-pack-bitmaps that should probably be combined into
> >> one, i.e. they share the same logical "repo" setup.
> >
> > Yeah. There's definitely room for clean-up there if we want to have each
> > of the tests operate on the same repo. I have always found sharing a
> > repo between tests difficult to reason about, since we have to assume
> > that any --run parameters could be passed in.
> >
> > So I have tended to err on the side of creating and throwing away a new
> > repo per test, but I understand that's somewhat atypical for Git's
> > tests.
>
> Just in an effort to make myself clear here, because between your note
> in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and
> re-reading my mail here I can barely understand what I meant here :)

Thanks for clarifying; if I could summarize I would say that this
discussion got started since a new test I added starts with:

    rm -fr repo &&
    git init repo &&
    test_when_finished "rm -fr repo"

where the first rm is designed to clean any state left behind from the
split-up tests added in c51f5a6437c.

I understand your suggestion, and I even think that it may be worth
doing, but I'm not sure that I buy that any such cleanup is related to
this series for any reason other than "you happened to add a new test in
this file which extended the pattern".

So let's pursue the cleanup, but outside of this series (and maybe once
the dust has settled more generally on the MIDX bitmaps topics to avoid
having the maintainer deal with any conflicts).

> Or, you can also just not re-use the "repo" name, which is what I did in
> the fsck tests at
> https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/;
> then you don't even need test_when_finished "rm -rf[ ...]".

TBH, I think that this is the most appealing thing to do. It's easy and
doesn't require us to think too hard about test_expect_setup or
sub-tests or anything like that where I'm not sure the complexity is
warranted.

I would probably do something like this:

--- >8 ---

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index ec4aa89f63..11845f67ae 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -132,12 +132,12 @@ test_expect_success 'clone with bitmaps enabled' '
 bitmap_reuse_tests() {
 	from=$1
 	to=$2
+	repo="bitmap-reuse-$from-$to"

 	test_expect_success "setup pack reuse tests ($from -> $to)" '
-		rm -fr repo &&
-		git init repo &&
+		git init $repo &&
 		(
-			cd repo &&
+			cd $repo &&
 			test_commit_bulk 16 &&
 			git tag old-tip &&

@@ -154,7 +154,7 @@ bitmap_reuse_tests() {

 	test_expect_success "build bitmap from existing ($from -> $to)" '
 		(
-			cd repo &&
+			cd $repo &&
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&

@@ -170,7 +170,7 @@ bitmap_reuse_tests() {

 	test_expect_success "verify resulting bitmaps ($from -> $to)" '
 		(
-			cd repo &&
+			cd $repo &&
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -183,7 +183,6 @@ bitmap_reuse_tests 'MIDX' 'pack'
 bitmap_reuse_tests 'MIDX' 'MIDX'

 test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
@@ -217,7 +216,6 @@ test_expect_success 'setup partial bitmaps' '
 basic_bitmap_tests HEAD~

 test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
@@ -284,7 +282,6 @@ test_expect_success 'pack.preferBitmapTips' '
 '

 test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
Ævar Arnfjörð Bjarmason Sept. 17, 2021, 7:22 p.m. UTC | #14
On Fri, Sep 17 2021, Taylor Blau wrote:

> On Fri, Sep 17, 2021 at 10:56:09AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> It seems the need for this "rm -fr repo" dance instead of just relying
>> >> on test_when_finished "rm -rf repo" is because of a 3x tests in a
>> >> function in tb/multi-pack-bitmaps that should probably be combined into
>> >> one, i.e. they share the same logical "repo" setup.
>> >
>> > Yeah. There's definitely room for clean-up there if we want to have each
>> > of the tests operate on the same repo. I have always found sharing a
>> > repo between tests difficult to reason about, since we have to assume
>> > that any --run parameters could be passed in.
>> >
>> > So I have tended to err on the side of creating and throwing away a new
>> > repo per test, but I understand that's somewhat atypical for Git's
>> > tests.
>>
>> Just in an effort to make myself clear here, because between your note
>> in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and
>> re-reading my mail here I can barely understand what I meant here :)
>
> Thanks for clarifying; if I could summarize I would say that this
> discussion got started since a new test I added starts with:
>
>     rm -fr repo &&
>     git init repo &&
>     test_when_finished "rm -fr repo"
>
> where the first rm is designed to clean any state left behind from the
> split-up tests added in c51f5a6437c.
>
> I understand your suggestion, and I even think that it may be worth
> doing, but I'm not sure that I buy that any such cleanup is related to
> this series for any reason other than "you happened to add a new test in
> this file which extended the pattern".
>
> So let's pursue the cleanup, but outside of this series (and maybe once
> the dust has settled more generally on the MIDX bitmaps topics to avoid
> having the maintainer deal with any conflicts).

Yes I agree, FWIW I didn't look carefully at what was in-flight where at
the time, or the series this depends on was itself in "seen", I can't
remember which..

>> Or, you can also just not re-use the "repo" name, which is what I did in
>> the fsck tests at
>> https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/;
>> then you don't even need test_when_finished "rm -rf[ ...]".
>
> TBH, I think that this is the most appealing thing to do. It's easy and
> doesn't require us to think too hard about test_expect_setup or
> sub-tests or anything like that where I'm not sure the complexity is
> warranted.
>
> I would probably do something like this:

Yeah, that bypasses the cleanup bit, but leaves --run broken as I
described. For new code I think it's good practice to have --run work
too.

Again, I think that's a non-issue here considering the rest of the
dumpster fire in the test suite when it comes to that, so I'm not
suggesting a re-roll or whatever.

I just used the upthread as a jumping off point since you had a question
about these patterns in the other topic, and perhaps you/some bystander
would be convinced and follow that pattern in the future.

> --- >8 ---
>
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index ec4aa89f63..11845f67ae 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -132,12 +132,12 @@ test_expect_success 'clone with bitmaps enabled' '
>  bitmap_reuse_tests() {
>  	from=$1
>  	to=$2
> +	repo="bitmap-reuse-$from-$to"
>
>  	test_expect_success "setup pack reuse tests ($from -> $to)" '
> -		rm -fr repo &&
> -		git init repo &&
> +		git init $repo &&
>  		(
> -			cd repo &&
> +			cd $repo &&
>  			test_commit_bulk 16 &&
>  			git tag old-tip &&
>
> @@ -154,7 +154,7 @@ bitmap_reuse_tests() {
>
>  	test_expect_success "build bitmap from existing ($from -> $to)" '
>  		(
> -			cd repo &&
> +			cd $repo &&
>  			test_commit_bulk --id=further 16 &&
>  			git tag new-tip &&
>
> @@ -170,7 +170,7 @@ bitmap_reuse_tests() {
>
>  	test_expect_success "verify resulting bitmaps ($from -> $to)" '
>  		(
> -			cd repo &&
> +			cd $repo &&
>  			git for-each-ref &&
>  			git rev-list --test-bitmap refs/tags/old-tip &&
>  			git rev-list --test-bitmap refs/tags/new-tip
> @@ -183,7 +183,6 @@ bitmap_reuse_tests 'MIDX' 'pack'
>  bitmap_reuse_tests 'MIDX' 'MIDX'
>
>  test_expect_success 'missing object closure fails gracefully' '
> -	rm -fr repo &&
>  	git init repo &&
>  	test_when_finished "rm -fr repo" &&
>  	(
> @@ -217,7 +216,6 @@ test_expect_success 'setup partial bitmaps' '
>  basic_bitmap_tests HEAD~
>
>  test_expect_success 'removing a MIDX clears stale bitmaps' '
> -	rm -fr repo &&
>  	git init repo &&
>  	test_when_finished "rm -fr repo" &&
>  	(
> @@ -284,7 +282,6 @@ test_expect_success 'pack.preferBitmapTips' '
>  '
>
>  test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> -	rm -fr repo &&
>  	git init repo &&
>  	test_when_finished "rm -fr repo" &&
>  	(
diff mbox series

Patch

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4ad7c2c969..24148ca35b 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -283,4 +283,36 @@  test_expect_success 'pack.preferBitmapTips' '
 	)
 '
 
+test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		git config pack.writeBitmapHashCache true &&
+
+		test_commit base &&
+		test_commit base2 &&
+		git repack -adb &&
+
+		test-tool bitmap dump-hashes >pack.raw &&
+		test_file_not_empty pack.raw &&
+		sort pack.raw >pack.hashes &&
+
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test-tool bitmap dump-hashes >midx.raw &&
+		sort midx.raw >midx.hashes &&
+
+		# ensure that every namehash in the pack bitmap can be found in
+		# the midx bitmap (i.e., that there are no oid-namehash pairs
+		# unique to the pack bitmap).
+		comm -23 pack.hashes midx.hashes >dropped.hashes &&
+		test_must_be_empty dropped.hashes
+	)
+'
+
 test_done