mbox series

[00/10] repack: support repacking into a geometric sequence

Message ID cover.1611098616.git.me@ttaylorr.com (mailing list archive)
Headers show
Series repack: support repacking into a geometric sequence | expand

Message

Taylor Blau Jan. 19, 2021, 11:23 p.m. UTC
This series introduces a new mode of 'git repack' where (instead of packing just
loose objects or packing everything together into one pack), the set of packs
left forms a geometric progression by object count.

It does not depend on either series of the revindex patches I sent recently.

Roughly speaking, for a given factor, say "d", each pack has at least "d" times
the number of objects as the next largest pack. So, if there are "N" packs,
"P1", "P2", ..., "PN" ordered by object count (where "PN" has the most objects,
and "P1" the fewest), then:

  objects(Pi) > d * objects(P(i-1))

for all 1 < i <= N.

This is done by first ordering packs by object count, and then determining the
longest sequence of large packs which already form a geometric progression. All
packs on the small side of that cut must be repacked together, and so we check
that the existing progression can be maintained with the new pack, and adjust as
necessary.

In actuality, this is approximated in order for 'git repack' to have to create
at most one new pack. The details of this approximation are discussed at length
in the final patch.

'git repack' implements this new option by marking the packs that don't need to
be touched as "frozen" and it does this by marking them as pack_keep_in_core,
and then using a new option pack-objects option '--assume-kept-packs-closed' to
stop the reachability traversal once it encounters any objects in the kept
packs.

When repacking in this mode, the caller implicitly trusts that the unchanged
packs are closed under reachability, and thus they can halt the traversal as
soon as an object in any one of those packs is found.

The first three patches introduce the new revision and pack-objects options
necessary for this to work. The next four patches introduce an MRU cache for
kept packs only. Then a new pack-objects mode is introduced to allow callers to
specify the list of kept packs over stdin in case they are too long to be listed
as arguments. Finally, geometric repacking is introduced

Thanks in advance for your review.

Jeff King (4):
  p5303: add missing &&-chains
  p5303: measure time to repack with keep
  pack-objects: rewrite honor-pack-keep logic
  packfile: add kept-pack cache for find_kept_pack_entry()

Taylor Blau (6):
  packfile: introduce 'find_kept_pack_entry()'
  revision: learn '--no-kept-objects'
  builtin/pack-objects.c: learn '--assume-kept-packs-closed'
  builtin/pack-objects.c: teach '--keep-pack-stdin'
  builtin/repack.c: extract loose object handling
  builtin/repack.c: add '--geometric' option

 Documentation/git-pack-objects.txt |  19 +++
 Documentation/git-repack.txt       |  11 ++
 Documentation/rev-list-options.txt |   7 +
 builtin/pack-objects.c             | 161 ++++++++++++++--------
 builtin/repack.c                   | 206 ++++++++++++++++++++++++++---
 list-objects.c                     |   7 +
 object-store.h                     |  10 ++
 packfile.c                         |  69 ++++++++++
 packfile.h                         |   2 +
 revision.c                         |  15 +++
 revision.h                         |   4 +
 t/perf/p5303-many-packs.sh         |  18 ++-
 t/t6114-keep-packs.sh              | 128 ++++++++++++++++++
 t/t7703-repack-geometric.sh        |  81 ++++++++++++
 14 files changed, 663 insertions(+), 75 deletions(-)
 create mode 100755 t/t6114-keep-packs.sh
 create mode 100755 t/t7703-repack-geometric.sh

Comments

Derrick Stolee Jan. 20, 2021, 2:05 p.m. UTC | #1
On 1/19/2021 6:23 PM, Taylor Blau wrote:
> This series introduces a new mode of 'git repack' where (instead of packing just
> loose objects or packing everything together into one pack), the set of packs
> left forms a geometric progression by object count.
...
> Thanks in advance for your review.

I had the pleasure of reading an early version of this series, but it's
been a while. Upon a fresh reading, I only had nitpicks. Otherwise, this
LGTM.

I encourage other reviewers to read patch 10 carefully, as that is the
most math-heavy of all of them.

Thanks,
-Stolee