Message ID | cover.1701198172.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | pack-objects: multi-pack verbatim reuse | expand |
On Tue, Nov 28, 2023 at 02:07:54PM -0500, Taylor Blau wrote: > Back in fff42755ef (pack-bitmap: add support for bitmap indexes, > 2013-12-21), we added support for reachability bitmaps, and taught > pack-objects how to reuse verbatim chunks from the bitmapped pack. When > multi-pack bitmaps were introduced, this pack-reuse mechanism evolved to > use the MIDX's "preferred" pack as the source for verbatim reuse. > > This allows repositories to incrementally repack themselves (e.g., using > a `--geometric` repack), storing the result in a MIDX, and generating a > corresponding bitmap. This keeps our bitmap coverage up-to-date, while > maintaining a relatively small number of packs. > > However, it is recommended (and matches what we do in production at > GitHub) that repositories repack themselves all-into-one, and > generate a corresponding single-pack reachability bitmap. This is done > for a couple of reasons, but the most relevant one to this series is > that it enables us to perform verbatim pack-reuse over a complete copy > of the repository, since the entire repository resides in a single pack > (and thus is eligible for verbatim pack-reuse). > > As repositories grow larger, packing their contents into a single pack > becomes less feasible. This series extends the pack-reuse mechanism to > operate over multiple packs which are known ahead of time to be disjoint > with respect to one another's set of objects. > > The implementation has a few components: > > - A new MIDX chunk called "Disjoint packfiles" or DISP is introduced > to keep track of the bitmap position, number of objects, and > disjointed-ness for each pack contained in the MIDX. > > - A new mode for `git multi-pack-index write --stdin-packs` that > allows specifying disjoint packs, as well as a new option > `--retain-disjoint` which preserves the set of existing disjoint > packs in the new MIDX. > > - A new pack-objects mode `--ignore-disjoint`, which produces packs > which are disjoint with respect to the current set of disjoint packs > (i.e. it discards any objects from the packing list which appear in > any of the known-disjoint packs). > > - A new repack mode, `--extend-disjoint` which causes any new pack(s) > which are generated to be disjoint with respect to the set of packs > currently marked as disjoint, minus any pack(s) which are about to > be deleted. > > With all of that in place, the patch series then rewrites all of the > pack-reuse functions in terms of the new `bitmapped_pack` structure. > Once we have dropped all of the assumptions stemming from only > performing pack-reuse over a single candidate pack, we can then enable > reuse over all of the disjoint packs. > > In addition to the many new tests in t5332 added by that series, I tried > to simulate a "real world" test on git.git by breaking the repository > into chunks of 1,000 commits (plus their set of reachable objects not > reachable from earlier chunk(s)) and packing those chunks. This produces > a large number of packs with the objects from git.git which are known to > be disjoint with respect to one another. > > $ git clone git@github.com:git/git.git base > > $ cd base > $ mv .git/objects/pack/pack-*.idx{,.bak} > $ git unpack-objects <.git/objects/pack/pack-*.pack > > # pack the objects from each successive block of 1k commits > $ for rev in $(git rev-list --all | awk '(NR) % 1000 == 0' | tac) > do > echo $rev | > git.compile pack-objects --revs --unpacked .git/objects/pack/pack || return 1 > done > # and grab any stragglers, pruning the unpacked objects > $ git repack -d > I then constructed a MIDX and corresponding bitmap > > $ find_pack () { > for idx in .git/objects/pack/pack-*.idx > do > git show-index <$idx | grep -q "$1" && basename $idx > done > } > $ preferred="$(find_pack $(git rev-parse HEAD))" > > $ ( cd .git/objects/pack && ls -1 *.idx ) | sed -e 's/^/+/g' | > git.compile multi-pack-index write --bitmap --stdin-packs \ > --preferred-pack=$preferred > $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in > > With all of that in place, I was able to produce a significant speed-up > by reusing objects from multiple packs: > > $ hyperfine -L v single,multi -n '{v}-pack reuse' 'git.compile -c pack.allowPackReuse={v} pack-objects --revs --stdout --use-bitmap-index --delta-base-offset <in >/dev/null' > Benchmark 1: single-pack reuse > Time (mean ± σ): 6.094 s ± 0.023 s [User: 43.723 s, System: 0.358 s] > Range (min … max): 6.063 s … 6.126 s 10 runs > > Benchmark 2: multi-pack reuse > Time (mean ± σ): 906.5 ms ± 3.2 ms [User: 1081.5 ms, System: 30.9 ms] > Range (min … max): 903.5 ms … 912.7 ms 10 runs > > Summary > multi-pack reuse ran > 6.72 ± 0.03 times faster than single-pack reuse > > (There are corresponding tests in p5332 that test different sized chunks > and measure the runtime performance as well as resulting pack size). > > Performing verbatim pack reuse naturally trades off between CPU time and > the resulting pack size. In the above example, the single-pack reuse > case produces a clone size of ~194 MB on my machine, while the > multi-pack reuse case produces a clone size closer to ~266 MB, which is > a ~37% increase in clone size. Quite exciting, and a tradeoff that may be worth it for Git hosters. I expect that this is going to be an extreme example of the benefits provided by your patch series -- do you by any chance also have "real" numbers that make it possible to quantify the effect a bit better? No worry if you don't, I'm just curious. > I think there is still some opportunity to close this gap, since the > "packing" strategy here is extremely naive. In a production setting, I'm > sure that there are more well thought out repacking strategies that > would produce more similar clone sizes. > > I considered breaking this series up into smaller chunks, but was > unsatisfied with the result. Since this series is rather large, if you > have alternate suggestions on better ways to structure this, please let > me know. The series is indeed very involved to review. I only made it up to patch 8/24 and already spent quite some time on it. So I'd certainly welcome it if this was split up into smaller parts, but don't have a suggestion as to how this should be done (also because I didn't yet read the other 16 patches). I'll review the remaining patches at a later point in time. Patrick > Thanks in advance for your review! > > Taylor Blau (24): > pack-objects: free packing_data in more places > pack-bitmap-write: deep-clear the `bb_commit` slab > pack-bitmap: plug leak in find_objects() > midx: factor out `fill_pack_info()` > midx: implement `DISP` chunk > midx: implement `midx_locate_pack()` > midx: implement `--retain-disjoint` mode > pack-objects: implement `--ignore-disjoint` mode > repack: implement `--extend-disjoint` mode > pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions > pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature > pack-bitmap: return multiple packs via > `reuse_partial_packfile_from_bitmap()` > pack-objects: parameterize pack-reuse routines over a single pack > pack-objects: keep track of `pack_start` for each reuse pack > pack-objects: pass `bitmapped_pack`'s to pack-reuse functions > pack-objects: prepare `write_reused_pack()` for multi-pack reuse > pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack > reuse > pack-objects: include number of packs reused in output > pack-bitmap: prepare to mark objects from multiple packs for reuse > pack-objects: add tracing for various packfile metrics > t/test-lib-functions.sh: implement `test_trace2_data` helper > pack-objects: allow setting `pack.allowPackReuse` to "single" > pack-bitmap: reuse objects from all disjoint packs > t/perf: add performance tests for multi-pack reuse > > Documentation/config/pack.txt | 8 +- > Documentation/git-multi-pack-index.txt | 12 ++ > Documentation/git-pack-objects.txt | 8 + > Documentation/git-repack.txt | 12 ++ > Documentation/gitformat-pack.txt | 109 ++++++++++ > builtin/multi-pack-index.c | 13 +- > builtin/pack-objects.c | 200 +++++++++++++++---- > builtin/repack.c | 57 +++++- > midx.c | 218 +++++++++++++++++--- > midx.h | 11 +- > pack-bitmap-write.c | 9 +- > pack-bitmap.c | 265 ++++++++++++++++++++----- > pack-bitmap.h | 18 +- > pack-objects.c | 15 ++ > pack-objects.h | 1 + > t/helper/test-read-midx.c | 31 ++- > t/lib-disjoint.sh | 49 +++++ > t/perf/p5332-multi-pack-reuse.sh | 81 ++++++++ > t/t5319-multi-pack-index.sh | 140 +++++++++++++ > t/t5331-pack-objects-stdin.sh | 156 +++++++++++++++ > t/t5332-multi-pack-reuse.sh | 219 ++++++++++++++++++++ > t/t6113-rev-list-bitmap-filters.sh | 2 + > t/t7700-repack.sh | 4 +- > t/t7705-repack-extend-disjoint.sh | 142 +++++++++++++ > t/test-lib-functions.sh | 14 ++ > 25 files changed, 1650 insertions(+), 144 deletions(-) > create mode 100644 t/lib-disjoint.sh > create mode 100755 t/perf/p5332-multi-pack-reuse.sh > create mode 100755 t/t5332-multi-pack-reuse.sh > create mode 100755 t/t7705-repack-extend-disjoint.sh > > > base-commit: 564d0252ca632e0264ed670534a51d18a689ef5d > -- > 2.43.0.24.g980b318f98
On Thu, Nov 30, 2023 at 11:18:19AM +0100, Patrick Steinhardt wrote: > > Performing verbatim pack reuse naturally trades off between CPU time and > > the resulting pack size. In the above example, the single-pack reuse > > case produces a clone size of ~194 MB on my machine, while the > > multi-pack reuse case produces a clone size closer to ~266 MB, which is > > a ~37% increase in clone size. > > Quite exciting, and a tradeoff that may be worth it for Git hosters. I > expect that this is going to be an extreme example of the benefits > provided by your patch series -- do you by any chance also have "real" > numbers that make it possible to quantify the effect a bit better? > > No worry if you don't, I'm just curious. I don't have a great sense, no. I haven't run these patches yet in production, although would like to do so soon for internal repositories to get a better sense here. There are some performance tests at the end which try and give you a sense of at least the relative speed-up depending on how many disjoint packs you have (IIRC, we test for 1, 10, and 100 disjoint packs). > > I think there is still some opportunity to close this gap, since the > > "packing" strategy here is extremely naive. In a production setting, I'm > > sure that there are more well thought out repacking strategies that > > would produce more similar clone sizes. > > > > I considered breaking this series up into smaller chunks, but was > > unsatisfied with the result. Since this series is rather large, if you > > have alternate suggestions on better ways to structure this, please let > > me know. > > The series is indeed very involved to review. I only made it up to patch > 8/24 and already spent quite some time on it. So I'd certainly welcome > it if this was split up into smaller parts, but don't have a suggestion > as to how this should be done (also because I didn't yet read the other > 16 patches). I suppose that one way to break it up might be: pack-objects: free packing_data in more places pack-bitmap-write: deep-clear the `bb_commit` slab pack-bitmap: plug leak in find_objects() midx: factor out `fill_pack_info()` midx: implement `DISP` chunk midx: implement `midx_locate_pack()` midx: implement `--retain-disjoint` mode pack-objects: implement `--ignore-disjoint` mode repack: implement `--extend-disjoint` mode pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` pack-objects: parameterize pack-reuse routines over a single pack pack-objects: keep track of `pack_start` for each reuse pack pack-objects: pass `bitmapped_pack`'s to pack-reuse functions pack-objects: prepare `write_reused_pack()` for multi-pack reuse pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse pack-objects: include number of packs reused in output pack-bitmap: prepare to mark objects from multiple packs for reuse pack-objects: add tracing for various packfile metrics t/test-lib-functions.sh: implement `test_trace2_data` helper pack-objects: allow setting `pack.allowPackReuse` to "single" pack-bitmap: reuse objects from all disjoint packs t/perf: add performance tests for multi-pack reuse Then you'd have five patch series, where each series does roughly the following: 1. Preparatory clean-up. 2. Implementing the DISP chunk, as well as --retain-disjoint, without a way to generate such packs. 3. Implement a way to generate such packs, but without actually being able to reuse more than one of them. 4. Implement multi-pack reuse, but without actually reusing any packs. 5. Enable multi-pack reuse (and implement the required scaffolding to do so), and test it. That's the most sensible split that I could come up with, at least. But I still find it relatively unsatisfying for a couple of reasons: - With the exception of the last group of patches, none of the earlier series enable any new, useful behavior on their own. IOW, if we just merged the first three series and then forgot about this topic, we wouldn't have done anything useful ;-). - The fourth series (which actually implements multi-pack reuse) still remains the most complicated, and would likely be the most difficult to review. So I think you'd still have one difficult series to review, plus four other series which are slightly less difficult to review ;-). It's entirely possible that I'm just too close to these patches to see a better split, so if you (or others) have any suggestions on how to break this up, please don't hesitate to share them. > I'll review the remaining patches at a later point in time. Thanks, I'll look forward to more of your review as usual :-). Thanks, Taylor
On Thu, Nov 30, 2023 at 02:39:41PM -0500, Taylor Blau wrote: > On Thu, Nov 30, 2023 at 11:18:19AM +0100, Patrick Steinhardt wrote: [snip] > Then you'd have five patch series, where each series does roughly the > following: > > 1. Preparatory clean-up. > 2. Implementing the DISP chunk, as well as --retain-disjoint, without > a way to generate such packs. > 3. Implement a way to generate such packs, but without actually being > able to reuse more than one of them. > 4. Implement multi-pack reuse, but without actually reusing any packs. > 5. Enable multi-pack reuse (and implement the required scaffolding to > do so), and test it. > > That's the most sensible split that I could come up with, at least. Looks sensible to me. > But > I still find it relatively unsatisfying for a couple of reasons: > > - With the exception of the last group of patches, none of the earlier > series enable any new, useful behavior on their own. IOW, if we just > merged the first three series and then forgot about this topic, we > wouldn't have done anything useful ;-). Well, sometimes I wish we'd buy more into the iterative style of working in the Git project, where it's fine to land patch series that only work into the direction of a specific topic but don't yet do anything interesting by themselves. The benefits would be both that individual pieces land faster while also ensuring that the review load is kept at bay. But there's of course also downsides to this, especially in an open source project like Git: - Contributors may go away in the middle of their endeavour, leaving behind dangling pieces that might have complicated some of our architecture without actually reaping the intended benefits. - Overall, it may take significantly longer to get all pieces into Git. - Contributors need to do a much better job to explain where they are headed when the series by itself doesn't do anything interesting yet. So I understand why we typically don't work this way. I wonder whether a compromise would be to continue sending complete patch series, but explicitly point out "break points" in your patch series. These break points could be an indicator to the maintainer that it's fine to merge everything up to it while still keeping out the remainder of the patch series. > - The fourth series (which actually implements multi-pack reuse) still > remains the most complicated, and would likely be the most difficult > to review. So I think you'd still have one difficult series to > review, plus four other series which are slightly less difficult to > review ;-). That's fine in my opinion, there's no surprise here that a complicated topic is demanding for both the author and the reviewer. Patrick
On Fri, Dec 01, 2023 at 09:31:14AM +0100, Patrick Steinhardt wrote: > > But > > I still find it relatively unsatisfying for a couple of reasons: > > > > - With the exception of the last group of patches, none of the earlier > > series enable any new, useful behavior on their own. IOW, if we just > > merged the first three series and then forgot about this topic, we > > wouldn't have done anything useful ;-). > > Well, sometimes I wish we'd buy more into the iterative style of working > in the Git project, where it's fine to land patch series that only work > into the direction of a specific topic but don't yet do anything > interesting by themselves. The benefits would be both that individual > pieces land faster while also ensuring that the review load is kept at > bay. > > But there's of course also downsides to this, especially in an open > source project like Git: I tend to agree with the downsides you list. My biggest concern with this series in particular is that we're trying to break down an all-or-nothing change into smaller components. So if we landed four out of the five of those series, it would be better to have landed none of them, since the first four aren't really all that useful on their own. I suppose if we're relatively confident that the last series will be merged eventually, then that seems like less of a concern. But I'm not sure that we're at that point yet. > I wonder whether a compromise would be to continue sending complete > patch series, but explicitly point out "break points" in your patch > series. These break points could be an indicator to the maintainer that > it's fine to merge everything up to it while still keeping out the > remainder of the patch series. I think that's a reasonable alternative. It does create a minor headache for the maintainer[^1], though, so I'd like to avoid it if possible. > > - The fourth series (which actually implements multi-pack reuse) still > > remains the most complicated, and would likely be the most difficult > > to review. So I think you'd still have one difficult series to > > review, plus four other series which are slightly less difficult to > > review ;-). > > That's fine in my opinion, there's no surprise here that a complicated > topic is demanding for both the author and the reviewer. My preference is to avoid splitting the series if we can help it. But if you feel strongly, or others feel similarly, I'm happy to take another crack at breaking it up. Thanks for all of your review so far! Thanks, Taylor
On Fri, Dec 01, 2023 at 03:02:22PM -0500, Taylor Blau wrote: > On Fri, Dec 01, 2023 at 09:31:14AM +0100, Patrick Steinhardt wrote: [snip] > I suppose if we're relatively confident that the last series will be > merged eventually, then that seems like less of a concern. But I'm not > sure that we're at that point yet. That's an additional valid concern indeed. [snip] > > > - The fourth series (which actually implements multi-pack reuse) still > > > remains the most complicated, and would likely be the most difficult > > > to review. So I think you'd still have one difficult series to > > > review, plus four other series which are slightly less difficult to > > > review ;-). > > > > That's fine in my opinion, there's no surprise here that a complicated > > topic is demanding for both the author and the reviewer. > > My preference is to avoid splitting the series if we can help it. But if > you feel strongly, or others feel similarly, I'm happy to take another > crack at breaking it up. Thanks for all of your review so far! I don't feel strongly about this at all, I've only tried to spell out my own thoughts in this context as I thought they were kind of relevant here. I've thought quite a lot about this topic recently due to my work on the reftable backend, where I'm trying to get as many pieces as possible landed individually before landing the actual backend itself. It's working well for most of the part, but in other contexts it's a bit weird that we try to cater towards something that doesn't exist yet. But naturally, the reftable work is of different nature than the topic you work on here and thus my own takeaways may likely not apply heer. To summarize, I think there is merit in splitting up patches into chunks that make it in individually and thus gradually work toward a topic, but I also totally understand why you (or Junio as the maintainer) might think that this is not a good idea. The ultimate decision for how to handle topics should be with the patch series' author and maintainer. Patrick
On Tue, Nov 28, 2023 at 02:07:54PM -0500, Taylor Blau wrote: > Performing verbatim pack reuse naturally trades off between CPU time and > the resulting pack size. In the above example, the single-pack reuse > case produces a clone size of ~194 MB on my machine, while the > multi-pack reuse case produces a clone size closer to ~266 MB, which is > a ~37% increase in clone size. Right, it's definitely a tradeoff. So taking a really big step back, there are a few optimizations all tied up in the verbatim reuse code: 1. in some cases we get to dump whole swaths of the on-disk packfile to the output, covering many objects with a few memcpy() calls. (This is still O(n), of course, but it's fewer instructions per object). 2. any other reused objects have only a small-ish amount of work to fix up ofs deltas, handle gaps, and so on. We get to skip adding them to the packing_list struct (this saves some CPU, but also a lot of memory) 3. we skip the delta search for these reused objects. This is where your big CPU / output size tradeoff comes into play, I'd think. So my question is: how much of what you're seeing is from (1) and (2), and how much is from (3)? Because there are other ways to trigger (3), such as lowering the window size. For example, if you try your same packing example with --window=0, how do the CPU and output size compare to the results of your series? (I'd also check peak memory usage). -Peff
On Tue, Dec 12, 2023 at 03:12:38AM -0500, Jeff King wrote: > So my question is: how much of what you're seeing is from (1) and (2), > and how much is from (3)? Because there are other ways to trigger (3), > such as lowering the window size. For example, if you try your same > packing example with --window=0, how do the CPU and output size compare > to the results of your series? (I'd also check peak memory usage). Interesting question! Here are some preliminary numbers on my machine (which runs Debian unstable with a Intel Xenon W-2255 CPU @ 3.70GHz and 64GB of RAM). I ran the following hyperfine command on my testing repository, which has the Git repository broken up into ~75 packs or so: $ hyperfine -L v single,multi -L window 0,10 \ --show-output \ -n '{v}-pack reuse, pack.window={window}' \ 'git.compile \ -c pack.allowPackReuse={v} \ -c pack.window={window} \ pack-objects --revs --stdout --use-bitmap-index --delta-base-offset <in 2>/dev/null | wc -c' Which gave the following results for runtime: Benchmark 1: single-pack reuse, pack.window=0 [...] Time (mean ± σ): 1.248 s ± 0.004 s [User: 1.160 s, System: 0.188 s] Range (min … max): 1.244 s … 1.259 s 10 runs Benchmark 2: multi-pack reuse, pack.window=0 [...] Time (mean ± σ): 1.075 s ± 0.005 s [User: 0.990 s, System: 0.188 s] Range (min … max): 1.071 s … 1.088 s 10 runs Benchmark 3: single-pack reuse, pack.window=10 [...] Time (mean ± σ): 6.281 s ± 0.024 s [User: 43.727 s, System: 0.492 s] Range (min … max): 6.252 s … 6.326 s 10 runs Benchmark 4: multi-pack reuse, pack.window=10 [...] Time (mean ± σ): 1.028 s ± 0.002 s [User: 1.150 s, System: 0.184 s] Range (min … max): 1.026 s … 1.032 s 10 runs Here are the average numbers for the resulting "clone" size in each of the above configurations: Benchmark 1: single-pack reuse, pack.window=0 264.443 MB Benchmark 2: multi-pack reuse, pack.window=0 268.670 MB Benchmark 3: single-pack reuse, pack.window=10 194.355 MB Benchmark 4: multi-pack reuse, pack.window=10 266.473 MB So it looks like setting pack.window=0 (with both single and multi-pack reuse) yields a similarly sized pack output as multi-pack reuse with any window setting. Running the same benchmark as above again, but this time sending the pack output to /dev/null and instead capturing the maximum RSS value from `/usr/bin/time -v` gives us the following (averages, in MB): Benchmark 1: single-pack reuse, pack.window=0 354.224 MB (max RSS) Benchmark 2: multi-pack reuse, pack.window=0 315.730 MB (max RSS) Benchmark 3: single-pack reuse, pack.window=10 470.651 MB (max RSS) Benchmark 4: multi-pack reuse, pack.window=10 328.786 MB (max RSS) So memory usage is similar between runs except for the single-pack reuse case with a window size of 10. It looks like the sweet spot is probably multi-pack reuse with a small-ish window size, which achieves the best of both worlds (small pack size, relative to other configurations that reuse large portions of the pack, and low memory usage). It's pretty close between multi-pack reuse with a window size of 0 and a window size of 10. If you want to optimize for pack size, you could trade a ~4% reduction in pack size for a ~1% increase in peak memory usage. Of course, YMMV depending on the repository, packing strategy, and pack.window configuration (among others) while packing. But this should give you a general idea of what to expect. Thanks, Taylor
On Fri, Dec 15, 2023 at 10:37:57AM -0500, Taylor Blau wrote: > On Tue, Dec 12, 2023 at 03:12:38AM -0500, Jeff King wrote: > > So my question is: how much of what you're seeing is from (1) and (2), > > and how much is from (3)? Because there are other ways to trigger (3), > > such as lowering the window size. For example, if you try your same > > packing example with --window=0, how do the CPU and output size compare > > to the results of your series? (I'd also check peak memory usage). > > Interesting question! Here are some preliminary numbers on my machine > (which runs Debian unstable with a Intel Xenon W-2255 CPU @ 3.70GHz and > 64GB of RAM). > > I ran the following hyperfine command on my testing repository, which > has the Git repository broken up into ~75 packs or so: Thanks for running these tests. The results are similar to what expected, which is: yes, most of your CPU savings are from skipping deltas, but not all. Here's what I see (which I think is mostly redundant with what you've said, but I just want to lay out my line of thinking). I'll reorder your quoted sections a bit as I go: > Benchmark 2: multi-pack reuse, pack.window=0 > [...] > Time (mean ± σ): 1.075 s ± 0.005 s [User: 0.990 s, System: 0.188 s] > Range (min … max): 1.071 s … 1.088 s 10 runs > > Benchmark 4: multi-pack reuse, pack.window=10 > [...] > Time (mean ± σ): 1.028 s ± 0.002 s [User: 1.150 s, System: 0.184 s] > Range (min … max): 1.026 s … 1.032 s 10 runs OK, so when we're doing more full ("multi") reuse, the pack window doesn't make a big difference either way. You didn't show the stderr from each, but presumably most of the objects are hitting the "reuse" path, and only a few are deltas (and that is backed up by the fact that doing deltas only gives us a slight improvement in the output size: > Benchmark 2: multi-pack reuse, pack.window=0 > 268.670 MB > Benchmark 4: multi-pack reuse, pack.window=10 > 266.473 MB Comparing the runs with less reuse: > Benchmark 1: single-pack reuse, pack.window=0 > [...] > Time (mean ± σ): 1.248 s ± 0.004 s [User: 1.160 s, System: 0.188 s] > Range (min … max): 1.244 s … 1.259 s 10 runs > > Benchmark 3: single-pack reuse, pack.window=10 > [...] > Time (mean ± σ): 6.281 s ± 0.024 s [User: 43.727 s, System: 0.492 s] > Range (min … max): 6.252 s … 6.326 s 10 runs there obviously is a huge amount of time saved by not doing deltas, but we pay for it with a much bigger pack: > Benchmark 1: single-pack reuse, pack.window=0 > 264.443 MB > Benchmark 3: single-pack reuse, pack.window=10 > 194.355 MB But of course that "much bigger" pack is about the same size as the one we get from doing multi-pack reuse. Which is not surprising, because both are avoiding looking for new deltas (and the packs after the preferred one probably have mediocre deltas). So I do actually think that disabling pack.window gives you a similar-ish tradeoff to expanding the pack-reuse code (~6s down to ~1s, and a 36% embiggening of the resulting pack size). Which implies that one option is to scrap your entire series and just set pack.window. Basically comparing multi/10 (your patches) to single/0 (a hypothetical config option), which have similar run-times and pack sizes. But that's not quite the whole story. There is still a CPU improvement in your series (1.2s vs 1.0s, a 20% speedup). And as I'd expect, a memory improvement from avoiding the extra book-keeping (almost 10%): > Benchmark 1: single-pack reuse, pack.window=0 > 354.224 MB (max RSS) > Benchmark 4: multi-pack reuse, pack.window=10 > 328.786 MB (max RSS) So while it's a lot less code to just set the window size, I do think those improvements are worth it. And really, it's the same tradeoff we make for the single-pack case (i.e., one could argue that we could/should rip out the verbatim-reuse code entirely in favor of just tweaking the window size). > It's pretty close between multi-pack reuse with a window size of 0 and > a window size of 10. If you want to optimize for pack size, you could > trade a ~4% reduction in pack size for a ~1% increase in peak memory > usage. I think if you want to optimize for pack size, you should consider repacking all-into-one to get better on-disk deltas. ;) I know that's easier said than done when the I/O costs are significant. I do wonder if storing thin packs on disk would let us more cheaply reach a state that could serve optimal-ish packs without spending CPU computing bespoke deltas for each client. But that's a much larger topic. -Peff
On Thu, Dec 21, 2023 at 06:13:33AM -0500, Jeff King wrote: > But that's not quite the whole story. There is still a CPU improvement > in your series (1.2s vs 1.0s, a 20% speedup). And as I'd expect, a > memory improvement from avoiding the extra book-keeping (almost 10%): > > > Benchmark 1: single-pack reuse, pack.window=0 > > 354.224 MB (max RSS) > > Benchmark 4: multi-pack reuse, pack.window=10 > > 328.786 MB (max RSS) I agree. And I expect that we'd see larger savings on larger, real-world repositories (the numbers here are generated from a semi out-of-date copy of git.git). > So while it's a lot less code to just set the window size, I do think > those improvements are worth it. And really, it's the same tradeoff we > make for the single-pack case (i.e., one could argue that we > could/should rip out the verbatim-reuse code entirely in favor of just > tweaking the window size). Definitely an interesting direction. One question that comes to mind is whether or not we'd want to keep the "verbatim" reuse code around to avoid adding objects to the packing list directly. That's a non-negligible memory savings when generating large packs where we can reuse large swaths of existing packs. That seems like a topic for another day, though ;-). Not quite left-over-bits, so maybe... #leftoverboulders? Thanks, Taylor