mbox series

[v2,0/5] repack: introduce '--combine-cruft-below-size'

Message ID cover.1742424671.git.me@ttaylorr.com (mailing list archive)
Headers show
Series repack: introduce '--combine-cruft-below-size' | expand

Message

Taylor Blau March 19, 2025, 10:52 p.m. UTC
Here is a small reroll of my series to introduce a new 'repack' flag
called '--combine-cruft-below-size'. There aren't any functional changes
in this round, and the changes that do exist are limited to
documentation and commit message tweaks in the final patch.

A range-diff is included below, as well as the original cover letter:

(This is based on tb/multi-cruft-pack-refresh-fix from Junio's tree,
which is at 08f612ba70 (builtin/pack-objects.c: freshen objects from
existing cruft packs, 2025-03-13) at the time of writing).

This series replaces something close to the existing behavior of
repack's '--max-cruft-size' flag with '--combine-cruft-below-size'.

The new flag is much clearer in its intent and function, and avoids the
lack of clarity between the two that was discussed in

  <cover.1741648467.git.me@ttaylorr.com>

The new behavior is as follows:

  - '--max-cruft-size' is a cruft pack-specific override for repack's
    '--max-pack-size' command-line flag.

  - '--combine-cruft-below-size' instructs repack to only combine cruft
    packs which are smaller than the given threshold. This will likely
    result in packs which are larger than the threshold. But that is OK:
    the point is to limit the size of the individual packs on input, not
    the size of the outgoing pack.

This series does break the existing behavior of '--max-cruft-size'. But
I think breaking backwards compatibility here is OK, since the existing
behavior was so broken to begin with. I'm happy to consider other
alternatives if others feel that this isn't OK.

The series has an interesting structure that I feel may be worth calling
out. The first three patches are trivial test cleanups. The fourth patch
modifies the existing behavior of '--max-cruft-size', but does so while
keeping some of the old infrastructure around.

That may seem like an unnecessarily complicated approach, but it greatly
simplifies introducing the new behavior in the following (and final)
commit. I think that this results in a series that is a little easier to
review (since we don't see a ton of code being removed in one commit and
then re-added in another immediately following it). But if others feel
like this was a mistake, please let me know ;-).

Thanks in advance for your review!

Taylor Blau (5):
  t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
  t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
  t/t7704-repack-cruft.sh: consolidate `write_blob()`
  repack: avoid combining cruft packs with `--max-cruft-size`
  repack: begin combining cruft packs with `--combine-cruft-below-size`

 Documentation/git-repack.adoc |  21 ++-
 builtin/repack.c              |  62 +++----
 t/t5329-pack-objects-cruft.sh | 302 ++++++----------------------------
 t/t7704-repack-cruft.sh       | 293 ++++++++++++++++++++++++++++++---
 4 files changed, 355 insertions(+), 323 deletions(-)

Range-diff against v1:
1:  0aa8aa65c1 = 1:  0aa8aa65c1 t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
2:  5e8bd3e66e = 2:  5e8bd3e66e t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
3:  b075ad8601 = 3:  b075ad8601 t/t7704-repack-cruft.sh: consolidate `write_blob()`
4:  7941997e33 = 4:  7941997e33 repack: avoid combining cruft packs with `--max-cruft-size`
5:  7f120c35e9 ! 5:  dee780a2aa repack: begin combining cruft packs with `--combine-cruft-below-size`
    @@ Commit message
         Introduce a new flag, '--combine-cruft-below-size' which is a
         replacement for the old behavior of '--max-cruft-size'. This new flag
         does explicitly what it says: it combines together cruft packs which are
    -    smaller than a given threshold, and prohibits repacking ones which are
    +    smaller than a given threshold, and leaves alone ones which are
         larger.

         This accomplishes the original intent of '--max-cruft-size', which was
    @@ Commit message
         But that's OK: the point isn't to restrict the size of the cruft packs
         we generate, it's to avoid working with ones that have already grown too
         large. If repositories still want to limit the size of the generated
    -    cruft pack(s), they may use '--max-cruft-size' instead.
    +    cruft pack(s), they may use '--max-cruft-size'.

         There's some minor test fallout as a result of the slight differences in
         behavior between the old meaning of '--max-cruft-size' and the behavior
    @@ Documentation/git-repack.adoc: to the new separate pack will be written.

     +--combine-cruft-below-size=<n>::
     +	When generating cruft packs without pruning, only repack
    -+	existing cruft packs whose size is strictly less than `<n>`.
    -+	Cruft packs whose size is greater than or equal to `<n>` are
    -+	left as-is and not repacked. Useful when you want to avoid
    -+	repacking large cruft pack(s) in repositories that have many
    -+	and/or large unreachable objects.
    ++	existing cruft packs whose size is strictly less than `<n>`,
    ++	where `<n>` represents a number of bytes, which can optionally
    ++	be suffixed with "k", "m", or "g". Cruft packs whose size is
    ++	greater than or equal to `<n>` are left as-is and not repacked.
    ++	Useful when you want to avoid repacking large cruft pack(s) in
    ++	repositories that have many and/or large unreachable objects.
     +
      --expire-to=<dir>::
      	Write a cruft pack containing pruned objects (if any) to the

base-commit: 08f612ba7000bf181ef6d8baed9ece322e567efd
--
2.49.0.4.ge59cf92f8d

Comments

Elijah Newren March 20, 2025, 4:30 a.m. UTC | #1
On Wed, Mar 19, 2025 at 3:52 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Here is a small reroll of my series to introduce a new 'repack' flag
> called '--combine-cruft-below-size'. There aren't any functional changes
> in this round, and the changes that do exist are limited to
> documentation and commit message tweaks in the final patch.
>
> A range-diff is included below, as well as the original cover letter:
>
> (This is based on tb/multi-cruft-pack-refresh-fix from Junio's tree,
> which is at 08f612ba70 (builtin/pack-objects.c: freshen objects from
> existing cruft packs, 2025-03-13) at the time of writing).
>
> This series replaces something close to the existing behavior of
> repack's '--max-cruft-size' flag with '--combine-cruft-below-size'.
>
> The new flag is much clearer in its intent and function, and avoids the
> lack of clarity between the two that was discussed in
>
>   <cover.1741648467.git.me@ttaylorr.com>
>
> The new behavior is as follows:
>
>   - '--max-cruft-size' is a cruft pack-specific override for repack's
>     '--max-pack-size' command-line flag.
>
>   - '--combine-cruft-below-size' instructs repack to only combine cruft
>     packs which are smaller than the given threshold. This will likely
>     result in packs which are larger than the threshold. But that is OK:
>     the point is to limit the size of the individual packs on input, not
>     the size of the outgoing pack.
>
> This series does break the existing behavior of '--max-cruft-size'. But
> I think breaking backwards compatibility here is OK, since the existing
> behavior was so broken to begin with. I'm happy to consider other
> alternatives if others feel that this isn't OK.
>
> The series has an interesting structure that I feel may be worth calling
> out. The first three patches are trivial test cleanups. The fourth patch
> modifies the existing behavior of '--max-cruft-size', but does so while
> keeping some of the old infrastructure around.
>
> That may seem like an unnecessarily complicated approach, but it greatly
> simplifies introducing the new behavior in the following (and final)
> commit. I think that this results in a series that is a little easier to
> review (since we don't see a ton of code being removed in one commit and
> then re-added in another immediately following it). But if others feel
> like this was a mistake, please let me know ;-).
>
> Thanks in advance for your review!
>
> Taylor Blau (5):
>   t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
>   t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
>   t/t7704-repack-cruft.sh: consolidate `write_blob()`
>   repack: avoid combining cruft packs with `--max-cruft-size`
>   repack: begin combining cruft packs with `--combine-cruft-below-size`
>
>  Documentation/git-repack.adoc |  21 ++-
>  builtin/repack.c              |  62 +++----
>  t/t5329-pack-objects-cruft.sh | 302 ++++++----------------------------
>  t/t7704-repack-cruft.sh       | 293 ++++++++++++++++++++++++++++++---
>  4 files changed, 355 insertions(+), 323 deletions(-)
>
> Range-diff against v1:
> 1:  0aa8aa65c1 = 1:  0aa8aa65c1 t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
> 2:  5e8bd3e66e = 2:  5e8bd3e66e t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
> 3:  b075ad8601 = 3:  b075ad8601 t/t7704-repack-cruft.sh: consolidate `write_blob()`
> 4:  7941997e33 = 4:  7941997e33 repack: avoid combining cruft packs with `--max-cruft-size`
> 5:  7f120c35e9 ! 5:  dee780a2aa repack: begin combining cruft packs with `--combine-cruft-below-size`
>     @@ Commit message
>          Introduce a new flag, '--combine-cruft-below-size' which is a
>          replacement for the old behavior of '--max-cruft-size'. This new flag
>          does explicitly what it says: it combines together cruft packs which are
>     -    smaller than a given threshold, and prohibits repacking ones which are
>     +    smaller than a given threshold, and leaves alone ones which are
>          larger.
>
>          This accomplishes the original intent of '--max-cruft-size', which was
>     @@ Commit message
>          But that's OK: the point isn't to restrict the size of the cruft packs
>          we generate, it's to avoid working with ones that have already grown too
>          large. If repositories still want to limit the size of the generated
>     -    cruft pack(s), they may use '--max-cruft-size' instead.
>     +    cruft pack(s), they may use '--max-cruft-size'.
>
>          There's some minor test fallout as a result of the slight differences in
>          behavior between the old meaning of '--max-cruft-size' and the behavior
>     @@ Documentation/git-repack.adoc: to the new separate pack will be written.
>
>      +--combine-cruft-below-size=<n>::
>      +  When generating cruft packs without pruning, only repack
>     -+  existing cruft packs whose size is strictly less than `<n>`.
>     -+  Cruft packs whose size is greater than or equal to `<n>` are
>     -+  left as-is and not repacked. Useful when you want to avoid
>     -+  repacking large cruft pack(s) in repositories that have many
>     -+  and/or large unreachable objects.
>     ++  existing cruft packs whose size is strictly less than `<n>`,
>     ++  where `<n>` represents a number of bytes, which can optionally
>     ++  be suffixed with "k", "m", or "g". Cruft packs whose size is
>     ++  greater than or equal to `<n>` are left as-is and not repacked.
>     ++  Useful when you want to avoid repacking large cruft pack(s) in
>     ++  repositories that have many and/or large unreachable objects.
>      +
>       --expire-to=<dir>::
>         Write a cruft pack containing pruned objects (if any) to the
>
> base-commit: 08f612ba7000bf181ef6d8baed9ece322e567efd
> --
> 2.49.0.4.ge59cf92f8d

v2 + your comments address all my feedback from v1, so this round
looks good to me.