mbox series

[0/2] repack: implement `--cruft-max-size`

Message ID cover.1694123506.git.me@ttaylorr.com (mailing list archive)
Headers show
Series repack: implement `--cruft-max-size` | expand

Message

Taylor Blau Sept. 7, 2023, 9:51 p.m. UTC
(These patches should be applied on top of a merge with
tb/repack-existing-packs-cleanup, and tb/multi-cruft-pack).

This series attempts to give users some more robust tools for managing
repositories with a large number of unreachable objects by storing them
in separate cruft packs, via a new option `--cruft-max-size`, like so:

    $ git.compile repack -d --cruft --max-pack-size=10M
    [...]
    Enumerating cruft objects: 617483, done.
    Counting objects: 100% (83791/83791), done.
    Delta compression using up to 20 threads
    Compressing objects: 100% (59696/59696), done.
    Writing objects: 100% (83791/83791), done.
    Total 83791 (delta 19251), reused 82502 (delta 19148), pack-reused 0

    $ ls -la .git/objects/pack/pack-*.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr 179144 Sep  7 17:46 .git/objects/pack/pack-1a95260d26f2897abfd2d54f1d58f535acb81d23.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr    452 Sep  7 17:46 .git/objects/pack/pack-5fde8701ae0f2e5553f1fa33de05faf12f94c07f.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr 155720 Sep  7 17:46 .git/objects/pack/pack-91f9e66921e0ebe1b5e35d34842551468cecdc28.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr     56 Sep  7 17:46 .git/objects/pack/pack-95fe626743207b177b45f32b60fdc313e525ea60.mtimes

The details are explained in the second patch, but the gist is that we
will combine cruft packs up until they reach a certain threshold (as
specified by `--cruft-max-size`) and then begin a new "generation" of
cruft packs. That younger generation will grow up until it reaches the
configured threshold, at which point it will become "frozen" and then
any new unreachable objects will be written into a new generation of
cruft packs.

The goal of this series is to reduce I/O churn in repositories that
either (a) have a large number of unreachable objects, (b) rarely prune
them, or (c) both.

Instead of having to rewrite a cruft pack containing every unreachable
object in the repository, we only have to rewrite a cruft pack up until
it reaches the given threshold, at which point it is effectively kept
(i.e., it behaves as if the cruft pack had a ".keep" file tied to it,
provided that the threshold is held constant).

Thanks in advance for your review!

Taylor Blau (2):
  t7700: split cruft-related tests to t7704
  builtin/repack.c: implement support for `--cruft-max-size`

 Documentation/config/gc.txt  |   6 +
 Documentation/git-gc.txt     |   7 +
 Documentation/git-repack.txt |   9 +
 builtin/gc.c                 |   8 +
 builtin/repack.c             | 133 +++++++++++--
 t/t6500-gc.sh                |  27 +++
 t/t7700-repack.sh            | 121 -----------
 t/t7704-repack-cruft.sh      | 375 +++++++++++++++++++++++++++++++++++
 8 files changed, 553 insertions(+), 133 deletions(-)
 create mode 100755 t/t7704-repack-cruft.sh