mbox series

[v4,0/6] pack-objects: freshen objects with multi-cruft packs

Message ID cover.1741648467.git.me@ttaylorr.com (mailing list archive)
Headers show
Series pack-objects: freshen objects with multi-cruft packs | expand

Message

Taylor Blau March 11, 2025, 12:21 a.m. UTC
Here is a slightly larger reroll of my series to fix object freshening
when using multi-cruft packs that I have been meaning to send for a
couple of days.

I realized after sending the last round that not only was the first
commit from v1 flawed (for the reasons Patrick identified) but that
there is currently no way to grow a new cruft pack past the configured
limit.

Independent of this series suppose for example that we have two 100 MiB
packs, and the threshold is 200 MiB. We should able to in theory combine
those packs together. But we can't! The largest pack we'll make is
199MiB (and change), since builtin/pack-objects.c::write_one() will
refuse to write any object which would bust the limit given by
--max-pack-size.

This series resurrects the first patch from v1 after introducing a
behavior change for 'git pack-objects --cruft --max-pack-size'. When
given with '--cruft', '--max-pack-size' now allows pack-objects to grow
a pack *just* past the given limit by at most one object. This allows
packs to grow past their threshold and age out of the active generation
of cruft packs so they are no longer repacked with each 'git repack
--cruft'.

We're way too late into the -rc cycle for this to land in the
forthcoming release, but I wanted to get this off of my workstation
anyway to allow folks to review it as they have time.

As usual, there is a range-diff since last time. Thanks in advance for
your review!

Taylor Blau (6):
  t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
  t7704-repack-cruft.sh: consolidate `write_blob()`
  t/lib-cruft.sh: extract some cruft-related helpers
  pack-objects: generate cruft packs at most one object over threshold
  builtin/repack.c: simplify cruft pack aggregation
  builtin/pack-objects.c: freshen objects from existing cruft packs

 Documentation/config/pack.adoc      |   4 +
 Documentation/git-pack-objects.adoc |   4 +
 builtin/pack-objects.c              | 150 +++++++++--
 builtin/repack.c                    |  38 +--
 packfile.c                          |   3 +-
 packfile.h                          |   2 +
 t/lib-cruft.sh                      |  23 ++
 t/t5329-pack-objects-cruft.sh       | 317 +++++-------------------
 t/t7704-repack-cruft.sh             | 372 +++++++++++++++++++++++-----
 9 files changed, 544 insertions(+), 369 deletions(-)
 create mode 100644 t/lib-cruft.sh

Range-diff against v3:
-:  ---------- > 1:  390c3a6d85 t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
-:  ---------- > 2:  e7ebe6c460 t7704-repack-cruft.sh: consolidate `write_blob()`
-:  ---------- > 3:  aa7588f817 t/lib-cruft.sh: extract some cruft-related helpers
-:  ---------- > 4:  f2ca92245a pack-objects: generate cruft packs at most one object over threshold
-:  ---------- > 5:  12ddea7603 builtin/repack.c: simplify cruft pack aggregation
1:  c80188164e = 6:  d44a124c81 builtin/pack-objects.c: freshen objects from existing cruft packs

base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3

Comments

Junio C Hamano March 11, 2025, 8:13 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> Here is a slightly larger reroll of my series to fix object freshening
> when using multi-cruft packs that I have been meaning to send for a
> couple of days.
>
> I realized after sending the last round that not only was the first
> commit from v1 flawed (for the reasons Patrick identified) but that
> there is currently no way to grow a new cruft pack past the configured
> limit.
>
> Independent of this series suppose for example that we have two 100 MiB
> packs, and the threshold is 200 MiB. We should able to in theory combine
> those packs together. But we can't! The largest pack we'll make is
> 199MiB (and change), since builtin/pack-objects.c::write_one() will
> refuse to write any object which would bust the limit given by
> --max-pack-size.

I am not sure why that is a problem.  If we have many loose objects
and the threshold is set at 200, wouldn't we also give up at 199
plus a change when packing these loose objects into a pack?  If the
last object that makes us bust the threshold is unusually large, say
50, we may give up at 150 plus a bit, unless we go back to the queue
and pick smaller objects among the remaining ones to fill the
remaining 50 minus a bit, and because we do not do that to enforce
max-pack-size, I am not sure how "give up just before the threshold"
is too bad and needs to be replaced with "give up just after".

Or is the problem that the threshold is applied differently based on
where the objects come from?  E.g., packing many loose objects would
stop just after, but repacking from cruft would stop just before, or
something?  If the problem is that we are inconsistent, then I would
understand that it may be good to make things consistent.

> This series resurrects the first patch from v1 after introducing a
> behavior change for 'git pack-objects --cruft --max-pack-size'. When
> given with '--cruft', '--max-pack-size' now allows pack-objects to grow
> a pack *just* past the given limit by at most one object.

And what happens when the last object appended is very large, like
70?  Would we end up with 270 when the threshold says 200?

I still am not getting what you are trying to explain in the above
two paragraphs, but in general, "give up just before" would be a
better choice than "give up just after", exactly because the threshold
we are letting the user to give is the maximum.