Message ID | acf3ec60cb6f151a9f121d242f38fef6711cced4.1631049462.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-bitmap: permute existing namehash values | expand |
On Tue, Sep 07 2021, Taylor Blau wrote: > Now that we both can propagate values from the hashcache, and respect > the configuration to enable the hashcache at all, test that both of > these function correctly by hardening their behavior with a test. > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh > index 4ad7c2c969..24148ca35b 100755 > --- a/t/t5326-multi-pack-bitmaps.sh > +++ b/t/t5326-multi-pack-bitmaps.sh > @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' ' > ) > ' > > +test_expect_success 'hash-cache values are propagated from pack bitmaps' ' > + rm -fr repo && > + git init repo && > + test_when_finished "rm -fr repo" && It seems the need for this "rm -fr repo" dance instead of just relying on test_when_finished "rm -rf repo" is because of a 3x tests in a function in tb/multi-pack-bitmaps that should probably be combined into one, i.e. they share the same logical "repo" setup. > + ( > + cd repo && > + > + git config pack.writeBitmapHashCache true && s/git config/test_config/, surely.
On Wed, Sep 08, 2021 at 03:46:29AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Sep 07 2021, Taylor Blau wrote: > > > Now that we both can propagate values from the hashcache, and respect > > the configuration to enable the hashcache at all, test that both of > > these function correctly by hardening their behavior with a test. > > > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > --- > > t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++ > > 1 file changed, 32 insertions(+) > > > > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh > > index 4ad7c2c969..24148ca35b 100755 > > --- a/t/t5326-multi-pack-bitmaps.sh > > +++ b/t/t5326-multi-pack-bitmaps.sh > > @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' ' > > ) > > ' > > > > +test_expect_success 'hash-cache values are propagated from pack bitmaps' ' > > + rm -fr repo && > > + git init repo && > > + test_when_finished "rm -fr repo" && > > It seems the need for this "rm -fr repo" dance instead of just relying > on test_when_finished "rm -rf repo" is because of a 3x tests in a > function in tb/multi-pack-bitmaps that should probably be combined into > one, i.e. they share the same logical "repo" setup. Yeah. There's definitely room for clean-up there if we want to have each of the tests operate on the same repo. I have always found sharing a repo between tests difficult to reason about, since we have to assume that any --run parameters could be passed in. So I have tended to err on the side of creating and throwing away a new repo per test, but I understand that's somewhat atypical for Git's tests. > > + ( > > + cd repo && > > + > > + git config pack.writeBitmapHashCache true && > > s/git config/test_config/, surely. No, since this is in a subshell (and we don't care about unsetting the value of a config in a repo that we're going to throw away, anyways). (Side-note: since this has a /bin/sh shebang, we can't detect that we're in a subshell and so test_config actually _would_ work here. But switching this test to run with /bin/bash where we can detect whether or not we're in a subshell would cause this test to fail with test_config). Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > Now that we both can propagate values from the hashcache, and respect > the configuration to enable the hashcache at all, test that both of > these function correctly by hardening their behavior with a test. When we serve "git fetch" requests from a resulting repository, because we have namehash available without traversing, we can serve a packfile to the requestor, using the reachability bitmap to pick objects that need to be sent, grouping the blobs with the same namehash to be deltified against each other, resulting in a pack that is better compressed than before? What I am wondering is if we can come up with a "now, because we no longer lose hashcache, we can do X so much better, here are the numbers".
On Sun, Sep 12, 2021 at 05:46:18PM -0700, Junio C Hamano wrote: > What I am wondering is if we can come up with a "now, because we no > longer lose hashcache, we can do X so much better, here are the > numbers". Sure, here they are. Bear in mind that these numbers are (a) on git.git and (b) with `pack.threads` set to 1 so the overall runtime looks more like CPU time. Test origin/tb/multi-pack-bitmaps HEAD ------------------------------------------------------------------------------------- 5326.4: simulated clone 1.87(1.80+0.07) 1.46(1.42+0.03) -21.9% 5326.5: simulated fetch 2.66(2.61+0.04) 1.47(1.43+0.04) -44.7% 5326.6: pack to file (bitmap) 2.74(2.62+0.12) 1.89(1.82+0.07) -31.0% Apologies for taking a little while to respond, I spent longer than I'm willing to admit double checking these numbers with Peff because of inconsistencies in my testing setup. Alas, there they are. They are basically no different than having the name-hash for single pack bitmaps, it's just now we don't throw them away when generating a MIDX bitmap from a state where the repository already has a single-pack bitmap. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > Alas, there they are. They are basically no different than having the > name-hash for single pack bitmaps, it's just now we don't throw them > away when generating a MIDX bitmap from a state where the repository > already has a single-pack bitmap. I actually wasn't expecting any CPU/time difference. I hope that we are talking about the same name-hash, which is used to sort the blobs so that when pack-objects try to find a good delta base, the blobs from the same path will sit close to each other and hopefully fit in the pack window. The effect I was hoping to see by not discarding the information was that we find better delta base hence smaller deltas in the resulting packfiles. Thanks.
On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > Alas, there they are. They are basically no different than having the > > name-hash for single pack bitmaps, it's just now we don't throw them > > away when generating a MIDX bitmap from a state where the repository > > already has a single-pack bitmap. > > I actually wasn't expecting any CPU/time difference. I think it is possible to see the CPU usage go down without affecting the resulting pack size. See below for a more detailed analysis. > I hope that we are talking about the same name-hash, which is used > to sort the blobs so that when pack-objects try to find a good delta > base, the blobs from the same path will sit close to each other and > hopefully fit in the pack window. Yes, of course. > The effect I was hoping to see by not discarding the information was > that we find better delta base hence smaller deltas in the resulting > packfiles. I think it is possible to observe either a decrease in CPU or a decrease in the resulting pack size. In my experience having the name-hash filled in results in finding good delta pairs much more quickly than without, but that in many repositories the resulting pack size is basically the same. In other words, the resulting pack is pretty similar whether you use the name-hash or not, it just affects how quickly you get there. Some experiments to back that up: I instrumented the existing p5326 by replacing anything like "pack-objects ... --stdout >/dev/null" with "pack-objects ... --stdout >pack.tmp" and then added test_size's to measure the size of each pack. On the tip of this branch, the results are: Test origin/tb/multi-pack-bitmaps HEAD ---------------------------------------------------------------------------- 5326.5: simulated clone size 3.3G 3.3G +0.0% 5326.7: simulated fetch size 10.5M 10.5M -0.2% 5326.21: clone (partial bitmap) 3.3G 3.3G +0.0% Looking at c171d3e677 (pack-bitmap: implement optional name_hash cache, 2013-10-22), I modified[1] that script to replace timing pack-objects with counting the number of bytes it wrote. Doing that shows that the name-hash doesn't make a substantial difference in the resulting pack size (numbers on a recent-ish copy of the kernel): Test c171d3e677d777c50231d8dea32ae691936da819^ c171d3e677d777c50231d8dea32ae691936da819 -------------------------------------------------------------------------------------------------------------- 9999.3: simulated clone 3.2G 3.2G +0.0% 9999.4: simulated fetch 32 32 +0.0% 9999.6: partial bitmap 3.1G 3.1G +0.0% (As a mostly-unrelated aside, I was curious why the pack size jumped from 3.2GB to 3.3GB, but I can reproduce that jump even in p5310--the single pack bitmap test--on the tip of my branch. So it does appear to be a regression which I'll look into, but it's unrelated to this branch or MIDX bitmaps). Thanks, Taylor [1]: https://gist.github.com/ttaylorr/6cfa3eb9fd012f81b833873d50f96f71
On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote: > On the tip of this branch, the results are: > > [...] Eek, those tabs are horrific. I must have left my editor in paste mode when inserting them, sorry about that. Thanks, Taylor
On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > Alas, there they are. They are basically no different than having the > > name-hash for single pack bitmaps, it's just now we don't throw them > > away when generating a MIDX bitmap from a state where the repository > > already has a single-pack bitmap. > > I actually wasn't expecting any CPU/time difference. I was, for the same reason we saw an improvement there in ae4f07fbcc (pack-bitmap: implement optional name_hash cache, 2013-12-21): without a name-hash, we try a bunch of fruitless deltas before we find a decent one. > I hope that we are talking about the same name-hash, which is used > to sort the blobs so that when pack-objects try to find a good delta > base, the blobs from the same path will sit close to each other and > hopefully fit in the pack window. Yes, exactly. We spend less time finding the good ones if the likely candidates are close together. We may _also_ find better ones overall, depending on the number of candidates and the window size. The bitmap perf tests (neither p5310 nor its new midx cousin p5326) don't check the output size. > The effect I was hoping to see by not discarding the information was > that we find better delta base hence smaller deltas in the resulting > packfiles. If we add a size check like so[1]: diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh index 63d3bc7cec..648cd5b13d 100644 --- a/t/perf/lib-bitmap.sh +++ b/t/perf/lib-bitmap.sh @@ -10,7 +10,11 @@ test_full_bitmap () { { echo HEAD && echo ^$have - } | git pack-objects --revs --stdout >/dev/null + } | git pack-objects --revs --stdout >tmp.pack + ' + + test_size 'fetch size' ' + wc -c <tmp.pack ' test_perf 'pack to file (bitmap)' ' then the results I get using linux.git are: Test origin/tb/multi-pack-bitmaps origin/tb/midx-write-propagate-namehash ------------------------------------------------------------------------------------------------- 5326.4: simulated fetch 2.32(7.16+0.21) 2.00(3.79+0.18) -13.8% 5326.5: fetch size 16.7M 15.5M -7.1% so you can see that we spent about half as much CPU (ignore the wall-clock percentage; the interesting thing is the userspace time, because my machine has 8 cores). But we also shaved off a bit from the pack, so we really did manage to find better deltas, too. I see that Taylor just posted a very similar response, and independently did the exact same experiment I did. ;) I'll send this anyway, though, as my particular run showed slightly different results. -Peff [1] The other thing you'd want (and I presume Taylor was using for his earlier timings) is: diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh index 5845109ac7..a4ac7746a7 100755 --- a/t/perf/p5326-multi-pack-bitmaps.sh +++ b/t/perf/p5326-multi-pack-bitmaps.sh @@ -11,7 +11,7 @@ test_expect_success 'enable multi-pack index' ' ' test_perf 'setup multi-pack index' ' - git repack -ad && + git repack -adb && git multi-pack-index write --bitmap ' since otherwise there is no pack bitmap for the midx to pull the name-hashes from.
On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote: > Some experiments to back that up: I instrumented the existing p5326 by > replacing anything like "pack-objects ... --stdout >/dev/null" with > "pack-objects ... --stdout >pack.tmp" and then added test_size's to > measure the size of each pack. > > On the tip of this branch, the results are: > > Test origin/tb/multi-pack-bitmaps HEAD > ---------------------------------------------------------------------------- > 5326.5: simulated clone size 3.3G 3.3G +0.0% > 5326.7: simulated fetch size 10.5M 10.5M -0.2% > 5326.21: clone (partial bitmap) 3.3G 3.3G +0.0% I wouldn't expect a change in the clone size. We're already including all the objects from the single pack, so we won't even look for new deltas. In my run, I did see a small improvement in the fetch size (though my size both before and after was larger than yours). This is going to depend on the exact set of deltas we have (which in turn depends on how your repo happens to have been packed before the script even starts) and which ones the client actually wants (which may depend on the exact tip of your repo). Presumably you also saw a decrease in the user CPU time of 5326.6 here. If not, you may have forgotten the extra patch to create the pack bitmap. -Peff
On Tue, Sep 14, 2021 at 01:27:31AM -0400, Jeff King wrote: > On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote: > > > Some experiments to back that up: I instrumented the existing p5326 by > > replacing anything like "pack-objects ... --stdout >/dev/null" with > > "pack-objects ... --stdout >pack.tmp" and then added test_size's to > > measure the size of each pack. > > > > On the tip of this branch, the results are: > > > > Test origin/tb/multi-pack-bitmaps HEAD > > ---------------------------------------------------------------------------- > > 5326.5: simulated clone size 3.3G 3.3G +0.0% > > 5326.7: simulated fetch size 10.5M 10.5M -0.2% > > 5326.21: clone (partial bitmap) 3.3G 3.3G +0.0% > > I wouldn't expect a change in the clone size. We're already including > all the objects from the single pack, so we won't even look for new > deltas. > > In my run, I did see a small improvement in the fetch size (though my > size both before and after was larger than yours). This is going to > depend on the exact set of deltas we have (which in turn depends on how > your repo happens to have been packed before the script even starts) and > which ones the client actually wants (which may depend on the exact tip > of your repo). Yes, I agree with all of that. I am still interested in trying to figure out why the resulting clone size seems to go up (independent of the changes here). I'm bisecting it, but it's slow, since every step requires you to repack the kernel. > Presumably you also saw a decrease in the user CPU time of 5326.6 here. > If not, you may have forgotten the extra patch to create the pack > bitmap. I did, but didn't bother to include the timings in the quoted part, since I already shared them in [1]. I have a handful of new patches for an updated version of this series which explains the extra patch you are talking about, too. Thanks, Taylor [1]: https://lore.kernel.org/git/YT%2F3BuDa7KfUN%2F38@nand.local/
Jeff King <peff@peff.net> writes: > On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote: > >> Taylor Blau <me@ttaylorr.com> writes: >> >> > Alas, there they are. They are basically no different than having the >> > name-hash for single pack bitmaps, it's just now we don't throw them >> > away when generating a MIDX bitmap from a state where the repository >> > already has a single-pack bitmap. >> >> I actually wasn't expecting any CPU/time difference. > > I was, for the same reason we saw an improvement there in ae4f07fbcc > (pack-bitmap: implement optional name_hash cache, 2013-12-21): without a > name-hash, we try a bunch of fruitless deltas before we find a decent > one. Nice. We learn new things every once in a while ;-) >> I hope that we are talking about the same name-hash, which is used >> to sort the blobs so that when pack-objects try to find a good delta >> base, the blobs from the same path will sit close to each other and >> hopefully fit in the pack window. > > Yes, exactly. We spend less time finding the good ones if the likely > candidates are close together. We may _also_ find better ones overall, > depending on the number of candidates and the window size. It is a pleasant surprise that in a real history like linux.git we can even find good delta base without the name hash (unless we are using insanely wide window size, that is). The objects in such a case will be sorted solely by size, larger to smaller, and we need to find a good delta base within that window. It may not be as horrible as fast-import (which only tries to delta against the previous single object), but with ~70k paths in a single revision with history that is ~1m deep, I was pessimistic to see a size-only sort to drop even a pair of blobs from the same path within the default window size of 10. But it seems that such a pessimism is unwarranted, which is a good news ;-).
On Tue, Sep 07 2021, Taylor Blau wrote: > On Wed, Sep 08, 2021 at 03:46:29AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Tue, Sep 07 2021, Taylor Blau wrote: >> >> > Now that we both can propagate values from the hashcache, and respect >> > the configuration to enable the hashcache at all, test that both of >> > these function correctly by hardening their behavior with a test. >> > >> > Signed-off-by: Taylor Blau <me@ttaylorr.com> >> > --- >> > t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++ >> > 1 file changed, 32 insertions(+) >> > >> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh >> > index 4ad7c2c969..24148ca35b 100755 >> > --- a/t/t5326-multi-pack-bitmaps.sh >> > +++ b/t/t5326-multi-pack-bitmaps.sh >> > @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' ' >> > ) >> > ' >> > >> > +test_expect_success 'hash-cache values are propagated from pack bitmaps' ' >> > + rm -fr repo && >> > + git init repo && >> > + test_when_finished "rm -fr repo" && >> >> It seems the need for this "rm -fr repo" dance instead of just relying >> on test_when_finished "rm -rf repo" is because of a 3x tests in a >> function in tb/multi-pack-bitmaps that should probably be combined into >> one, i.e. they share the same logical "repo" setup. > > Yeah. There's definitely room for clean-up there if we want to have each > of the tests operate on the same repo. I have always found sharing a > repo between tests difficult to reason about, since we have to assume > that any --run parameters could be passed in. > > So I have tended to err on the side of creating and throwing away a new > repo per test, but I understand that's somewhat atypical for Git's > tests. Just in an effort to make myself clear here, because between your note in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and re-reading my mail here I can barely understand what I meant here :) What I mean is that the whole "rm -rf" dance at the start of tests is fallout from an earlier call you made in c51f5a6437c (t5326: test multi-pack bitmap behavior, 2021-08-31) to split up three tests that really are the same logical test in terms of setup/teardown. If they were to be re-combined as shown in the diff-on-top at the end here none of the tests need a "rm -rf repo" at the beginning, because they can all rely on a starting pattern of: test_when_finished "rm -rf repo" && git init repo && ( cd repo ... ) Or, you can also just not re-use the "repo" name, which is what I did in the fsck tests at https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/; then you don't even need test_when_finished "rm -rf[ ...]". None of this requires fixing here or a re-roll, unless you happen to think this diff is the best thing since sliced bread (assume my Signed-off-by). But just as a general musing on patterns to use in tests, I see why you used that 1x test as 3x, because you want "test_expect_success" to give you the label on for each "step". I think that would be worth fixing in test-lib.sh, there's no inherent reason we couldn't support "checkpoints" or "subtests" within "test_expect_success" (the latter being a part of the TAP protocol). But until then IMNSHO this sort of thing is an anti-pattern, i.e. needing to write things like: test_expect_success '[...]' 'A' test_expect_success '[...]' 'B' Where a part of what B needs to do is to clean up the mess left behind A, or relies on its setup. That's just added verbosity and context when reading the code. Ideally you'd just need to read the "setup" at the start, and then individual tests, with this pattern you need to read the preceding test(s) to see where the crap they're cleaning came from, and if it's even needed etc. (For the "setup" part I've suggested a test_expect_setup, see https://lore.kernel.org/git/8735vrvg39.fsf@evledraar.gmail.com/). The "needs the setup" part of this has the negative side-effect of breaking the semantics of the --run option. As noted in the E-Mail linked in the last paragraph it doesn't work well in general because of things like this, but let's steer in the right direction. I.e. with this change you can run: ./t5326-multi-pack-bitmaps.sh --run=111-113 Which is great, usually you need at least 1,<nums you want> ,r 1-10,<nums you want> to get the setup. But because you split them this breaks: ./t5326-multi-pack-bitmaps.sh --run=112-113 I.e. we'll run only the 2nd and 3rd test in a sequence that needs the 1st for setup. For context: My preference for this isn't just an asthetic or theoretical thing. It's really nice to be hacking some code, find that I've broken the bitmap code somehow in your test #112, and be able to just run (since running the whole thing takes 13s on my box, and I'd like to test in a tight loop): ./t5326-multi-pack-bitmaps.sh --run=112 But that breaks as noted above, so I need read earlier tests (I was probably looking at test #112 already at this point) and see how their setup works etc. Anyway, as with so many things in git.git being able to just assume --run works like this isn't something you can rely on in the general case, but just as a matter of where we should be headed... >> > + ( >> > + cd repo && >> > + >> > + git config pack.writeBitmapHashCache true && >> >> s/git config/test_config/, surely. > > No, since this is in a subshell (and we don't care about unsetting the > value of a config in a repo that we're going to throw away, anyways). > > (Side-note: since this has a /bin/sh shebang, we can't detect that we're > in a subshell and so test_config actually _would_ work here. But > switching this test to run with /bin/bash where we can detect whether or > not we're in a subshell would cause this test to fail with test_config). Yes, you're 100% right here. This was purely a misreading on my part, I managed to somehow not see/take into account the subshell. Using test_config makes no sense here. The diff-on-top for discussion mentioned above: diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 24148ca35b3..a6bb0abb387 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -133,8 +133,8 @@ bitmap_reuse_tests() { from=$1 to=$2 - test_expect_success "setup pack reuse tests ($from -> $to)" ' - rm -fr repo && + test_expect_success "setup/build/verify ($from -> $to) pack reuse bitmaps" ' + test_when_finished "rm -rf repo" && git init repo && ( cd repo && @@ -148,13 +148,9 @@ bitmap_reuse_tests() { git multi-pack-index write --bitmap else git repack -Adb - fi - ) - ' + fi && - test_expect_success "build bitmap from existing ($from -> $to)" ' - ( - cd repo && + # Build test_commit_bulk --id=further 16 && git tag new-tip && @@ -164,13 +160,9 @@ bitmap_reuse_tests() { git multi-pack-index write --bitmap else git repack -Adb - fi - ) - ' + fi && - test_expect_success "verify resulting bitmaps ($from -> $to)" ' - ( - cd repo && + # Verify git for-each-ref && git rev-list --test-bitmap refs/tags/old-tip && git rev-list --test-bitmap refs/tags/new-tip @@ -183,9 +175,8 @@ bitmap_reuse_tests 'MIDX' 'pack' bitmap_reuse_tests 'MIDX' 'MIDX' test_expect_success 'missing object closure fails gracefully' ' - rm -fr repo && - git init repo && test_when_finished "rm -fr repo" && + git init repo && ( cd repo && @@ -217,9 +208,8 @@ test_expect_success 'setup partial bitmaps' ' basic_bitmap_tests HEAD~ test_expect_success 'removing a MIDX clears stale bitmaps' ' - rm -fr repo && - git init repo && test_when_finished "rm -fr repo" && + git init repo && ( cd repo && test_commit base && @@ -245,8 +235,8 @@ test_expect_success 'removing a MIDX clears stale bitmaps' ' ' test_expect_success 'pack.preferBitmapTips' ' - git init repo && test_when_finished "rm -fr repo" && + git init repo && ( cd repo && @@ -284,9 +274,8 @@ test_expect_success 'pack.preferBitmapTips' ' ' test_expect_success 'hash-cache values are propagated from pack bitmaps' ' - rm -fr repo && - git init repo && test_when_finished "rm -fr repo" && + git init repo && ( cd repo &&
On Fri, Sep 17, 2021 at 10:56:09AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> It seems the need for this "rm -fr repo" dance instead of just relying > >> on test_when_finished "rm -rf repo" is because of a 3x tests in a > >> function in tb/multi-pack-bitmaps that should probably be combined into > >> one, i.e. they share the same logical "repo" setup. > > > > Yeah. There's definitely room for clean-up there if we want to have each > > of the tests operate on the same repo. I have always found sharing a > > repo between tests difficult to reason about, since we have to assume > > that any --run parameters could be passed in. > > > > So I have tended to err on the side of creating and throwing away a new > > repo per test, but I understand that's somewhat atypical for Git's > > tests. > > Just in an effort to make myself clear here, because between your note > in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and > re-reading my mail here I can barely understand what I meant here :) Thanks for clarifying; if I could summarize I would say that this discussion got started since a new test I added starts with: rm -fr repo && git init repo && test_when_finished "rm -fr repo" where the first rm is designed to clean any state left behind from the split-up tests added in c51f5a6437c. I understand your suggestion, and I even think that it may be worth doing, but I'm not sure that I buy that any such cleanup is related to this series for any reason other than "you happened to add a new test in this file which extended the pattern". So let's pursue the cleanup, but outside of this series (and maybe once the dust has settled more generally on the MIDX bitmaps topics to avoid having the maintainer deal with any conflicts). > Or, you can also just not re-use the "repo" name, which is what I did in > the fsck tests at > https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/; > then you don't even need test_when_finished "rm -rf[ ...]". TBH, I think that this is the most appealing thing to do. It's easy and doesn't require us to think too hard about test_expect_setup or sub-tests or anything like that where I'm not sure the complexity is warranted. I would probably do something like this: --- >8 --- diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index ec4aa89f63..11845f67ae 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -132,12 +132,12 @@ test_expect_success 'clone with bitmaps enabled' ' bitmap_reuse_tests() { from=$1 to=$2 + repo="bitmap-reuse-$from-$to" test_expect_success "setup pack reuse tests ($from -> $to)" ' - rm -fr repo && - git init repo && + git init $repo && ( - cd repo && + cd $repo && test_commit_bulk 16 && git tag old-tip && @@ -154,7 +154,7 @@ bitmap_reuse_tests() { test_expect_success "build bitmap from existing ($from -> $to)" ' ( - cd repo && + cd $repo && test_commit_bulk --id=further 16 && git tag new-tip && @@ -170,7 +170,7 @@ bitmap_reuse_tests() { test_expect_success "verify resulting bitmaps ($from -> $to)" ' ( - cd repo && + cd $repo && git for-each-ref && git rev-list --test-bitmap refs/tags/old-tip && git rev-list --test-bitmap refs/tags/new-tip @@ -183,7 +183,6 @@ bitmap_reuse_tests 'MIDX' 'pack' bitmap_reuse_tests 'MIDX' 'MIDX' test_expect_success 'missing object closure fails gracefully' ' - rm -fr repo && git init repo && test_when_finished "rm -fr repo" && ( @@ -217,7 +216,6 @@ test_expect_success 'setup partial bitmaps' ' basic_bitmap_tests HEAD~ test_expect_success 'removing a MIDX clears stale bitmaps' ' - rm -fr repo && git init repo && test_when_finished "rm -fr repo" && ( @@ -284,7 +282,6 @@ test_expect_success 'pack.preferBitmapTips' ' ' test_expect_success 'hash-cache values are propagated from pack bitmaps' ' - rm -fr repo && git init repo && test_when_finished "rm -fr repo" && (
On Fri, Sep 17 2021, Taylor Blau wrote: > On Fri, Sep 17, 2021 at 10:56:09AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> It seems the need for this "rm -fr repo" dance instead of just relying >> >> on test_when_finished "rm -rf repo" is because of a 3x tests in a >> >> function in tb/multi-pack-bitmaps that should probably be combined into >> >> one, i.e. they share the same logical "repo" setup. >> > >> > Yeah. There's definitely room for clean-up there if we want to have each >> > of the tests operate on the same repo. I have always found sharing a >> > repo between tests difficult to reason about, since we have to assume >> > that any --run parameters could be passed in. >> > >> > So I have tended to err on the side of creating and throwing away a new >> > repo per test, but I understand that's somewhat atypical for Git's >> > tests. >> >> Just in an effort to make myself clear here, because between your note >> in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and >> re-reading my mail here I can barely understand what I meant here :) > > Thanks for clarifying; if I could summarize I would say that this > discussion got started since a new test I added starts with: > > rm -fr repo && > git init repo && > test_when_finished "rm -fr repo" > > where the first rm is designed to clean any state left behind from the > split-up tests added in c51f5a6437c. > > I understand your suggestion, and I even think that it may be worth > doing, but I'm not sure that I buy that any such cleanup is related to > this series for any reason other than "you happened to add a new test in > this file which extended the pattern". > > So let's pursue the cleanup, but outside of this series (and maybe once > the dust has settled more generally on the MIDX bitmaps topics to avoid > having the maintainer deal with any conflicts). Yes I agree, FWIW I didn't look carefully at what was in-flight where at the time, or the series this depends on was itself in "seen", I can't remember which.. >> Or, you can also just not re-use the "repo" name, which is what I did in >> the fsck tests at >> https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/; >> then you don't even need test_when_finished "rm -rf[ ...]". > > TBH, I think that this is the most appealing thing to do. It's easy and > doesn't require us to think too hard about test_expect_setup or > sub-tests or anything like that where I'm not sure the complexity is > warranted. > > I would probably do something like this: Yeah, that bypasses the cleanup bit, but leaves --run broken as I described. For new code I think it's good practice to have --run work too. Again, I think that's a non-issue here considering the rest of the dumpster fire in the test suite when it comes to that, so I'm not suggesting a re-roll or whatever. I just used the upthread as a jumping off point since you had a question about these patterns in the other topic, and perhaps you/some bystander would be convinced and follow that pattern in the future. > --- >8 --- > > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh > index ec4aa89f63..11845f67ae 100755 > --- a/t/t5326-multi-pack-bitmaps.sh > +++ b/t/t5326-multi-pack-bitmaps.sh > @@ -132,12 +132,12 @@ test_expect_success 'clone with bitmaps enabled' ' > bitmap_reuse_tests() { > from=$1 > to=$2 > + repo="bitmap-reuse-$from-$to" > > test_expect_success "setup pack reuse tests ($from -> $to)" ' > - rm -fr repo && > - git init repo && > + git init $repo && > ( > - cd repo && > + cd $repo && > test_commit_bulk 16 && > git tag old-tip && > > @@ -154,7 +154,7 @@ bitmap_reuse_tests() { > > test_expect_success "build bitmap from existing ($from -> $to)" ' > ( > - cd repo && > + cd $repo && > test_commit_bulk --id=further 16 && > git tag new-tip && > > @@ -170,7 +170,7 @@ bitmap_reuse_tests() { > > test_expect_success "verify resulting bitmaps ($from -> $to)" ' > ( > - cd repo && > + cd $repo && > git for-each-ref && > git rev-list --test-bitmap refs/tags/old-tip && > git rev-list --test-bitmap refs/tags/new-tip > @@ -183,7 +183,6 @@ bitmap_reuse_tests 'MIDX' 'pack' > bitmap_reuse_tests 'MIDX' 'MIDX' > > test_expect_success 'missing object closure fails gracefully' ' > - rm -fr repo && > git init repo && > test_when_finished "rm -fr repo" && > ( > @@ -217,7 +216,6 @@ test_expect_success 'setup partial bitmaps' ' > basic_bitmap_tests HEAD~ > > test_expect_success 'removing a MIDX clears stale bitmaps' ' > - rm -fr repo && > git init repo && > test_when_finished "rm -fr repo" && > ( > @@ -284,7 +282,6 @@ test_expect_success 'pack.preferBitmapTips' ' > ' > > test_expect_success 'hash-cache values are propagated from pack bitmaps' ' > - rm -fr repo && > git init repo && > test_when_finished "rm -fr repo" && > (
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 4ad7c2c969..24148ca35b 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' ' ) ' +test_expect_success 'hash-cache values are propagated from pack bitmaps' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + git config pack.writeBitmapHashCache true && + + test_commit base && + test_commit base2 && + git repack -adb && + + test-tool bitmap dump-hashes >pack.raw && + test_file_not_empty pack.raw && + sort pack.raw >pack.hashes && + + test_commit new && + git repack && + git multi-pack-index write --bitmap && + + test-tool bitmap dump-hashes >midx.raw && + sort midx.raw >midx.hashes && + + # ensure that every namehash in the pack bitmap can be found in + # the midx bitmap (i.e., that there are no oid-namehash pairs + # unique to the pack bitmap). + comm -23 pack.hashes midx.hashes >dropped.hashes && + test_must_be_empty dropped.hashes + ) +' + test_done
Now that we both can propagate values from the hashcache, and respect the configuration to enable the hashcache at all, test that both of these function correctly by hardening their behavior with a test. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)