mbox series

[00/19] midx: incremental multi-pack indexes, part one

Message ID cover.1717715060.git.me@ttaylorr.com (mailing list archive)
Headers show
Series midx: incremental multi-pack indexes, part one | expand

Message

Taylor Blau June 6, 2024, 11:04 p.m. UTC
This series implements incremental MIDXs, which allow for storing
a MIDX across multiple layers, each with their own distinct set of
packs.

MOTIVATION
==========

Doing so allows large repositories to make use of the MIDX feature
without having to rewrite the entire MIDX every time they want to update
the set of packs contained in the MIDX. For extremely large
repositories, doing so is often infeasible.

OVERVIEW
========

This series implements the first component of incremental MIDXs, meaning
by the end of it you can run:

    $ git multi-pack-index write --incremental

a couple of times, and produce a directory structure like:

    $ .git/objects/pack/multi-pack-index.d
    .git/objects/pack/multi-pack-index.d
    ├── multi-pack-index-chain
    ├── multi-pack-index-baa53bc5092bed50378fe9232ae7878828df2890.midx
    └── multi-pack-index-f60023a8a104be94eab96dd7c42a6a5db67c82ba.midx

where each *.midx file behaves the same way as existing non-incremental
MIDX implementation behaves today, but in a way that stitches together
multiple MIDX "layers" without having to rewrite the whole MIDX anytime
you want to make a modification to it.

This is "part one" of a multi-part series. The overview of how all of
these series fit together is as follows:

  - "Part zero": preparatory work like 'tb/midx-write-cleanup' and my
    series to clean up temporary file handling [1, 2].

  - "Part one": this series, which enables reading and writing
    incremental MIDXs, but does not have support for more advanced
    features like bitmaps support or rewriting parts of the MIDX chain.

  - "Part two": the next series, which builds on support for multi-pack
    reachability bitmaps in an incremental MIDX world, meaning that each
    `*.midx` layer can have its own `*.bitmap`, and the bitmaps at each
    layer can be used together.

  - "Part three": which supports more advanced management of the MIDX
    chain, like compressing intermediate layers to avoid the chain
    growing too long.

Parts zero, one, and two all exist, and the first two have been shared
with the list. Part two exists in ttaylorr/git [3], but is excluded from
this series to keep the length manageable. I avoided sending this series
until I was confident that bitmaps worked on top of incremental MIDXs to
avoid designing ourselves into a corner.

Part three doesn't exist yet, but is straightforward to do on top. None
of the design decisions made in this series inhibit my goals for part
three.

[1]: https://lore.kernel.org/git/cover.1717023301.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1717712358.git.me@ttaylorr.com/
[3]: https://github.com/ttaylorr/git/compare/tb/incremental-midx...ttaylorr:git:tb/incremental-midx-bitmaps

Taylor Blau (19):
  Documentation: describe incremental MIDX format
  midx: add new fields for incremental MIDX chains
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: implement verification support for incremental MIDXs
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  midx: implement support for writing incremental MIDX chains

 Documentation/git-multi-pack-index.txt       |  11 +-
 Documentation/technical/multi-pack-index.txt | 100 +++++
 builtin/multi-pack-index.c                   |   2 +
 builtin/repack.c                             |   8 +-
 ci/run-build-and-tests.sh                    |   2 +-
 midx-write.c                                 | 293 +++++++++++--
 midx.c                                       | 410 ++++++++++++++++---
 midx.h                                       |  26 +-
 object-name.c                                |  99 ++---
 packfile.c                                   |  21 +-
 packfile.h                                   |   4 +
 t/README                                     |   6 +-
 t/helper/test-read-midx.c                    |  24 +-
 t/lib-bitmap.sh                              |   6 +-
 t/lib-midx.sh                                |  28 ++
 t/t0410-partial-clone.sh                     |   2 -
 t/t5310-pack-bitmaps.sh                      |   4 -
 t/t5313-pack-bounds-checks.sh                |   8 +-
 t/t5319-multi-pack-index.sh                  |  30 +-
 t/t5326-multi-pack-bitmaps.sh                |   4 +-
 t/t5327-multi-pack-bitmaps-rev.sh            |   6 +-
 t/t5332-multi-pack-reuse.sh                  |   2 +
 t/t5334-incremental-multi-pack-index.sh      |  46 +++
 t/t7700-repack.sh                            |  48 +--
 24 files changed, 935 insertions(+), 255 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh


base-commit: 680474691b4639280a73baa0bb8792634f99f611

Comments

Taylor Blau June 6, 2024, 11:06 p.m. UTC | #1
On Thu, Jun 06, 2024 at 07:04:22PM -0400, Taylor Blau wrote:
> This series implements incremental MIDXs, which allow for storing
> a MIDX across multiple layers, each with their own distinct set of
> packs.

I forgot to mention, this series is based off a merge with current
master and 'tb/midx-write-cleanup'.

The latter topic is marked to merge into 'master', but hasn't been
pushed out yet, hence the dependency on a merge with that and 'master'
instead of just 'master'.

Thanks,
Taylor
Junio C Hamano June 7, 2024, 5:55 p.m. UTC | #2
Taylor Blau <me@ttaylorr.com> writes:

> Part three doesn't exist yet, but is straightforward to do on top. None
> of the design decisions made in this series inhibit my goals for part
> three.

Nice to always see the bigger picture to come to understand where
the current series fits, but the above is a bit peculiar thing to
say.  Of course there should be no design decision the currently
posted series makes that would block your future work---otherwise
you would not be posting it.  The real question is rather the future
and yet to be written work is still feasible after the design
decisions the current series made are found to be broken and need to
be revised (if it happens---but we do not know until we see reviews).

Thanks.
Junio C Hamano June 7, 2024, 6:33 p.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> I forgot to mention, this series is based off a merge with current
> master and 'tb/midx-write-cleanup'.

I think I saw "am -3" fall back to three-way at around [17/19] for
t0410 while applying on that base, but it wasn't anything "am -3"
couldn't handle.

Queued.

Thanks.
Taylor Blau June 7, 2024, 8:29 p.m. UTC | #4
On Fri, Jun 07, 2024 at 11:33:13AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > I forgot to mention, this series is based off a merge with current
> > master and 'tb/midx-write-cleanup'.
>
> I think I saw "am -3" fall back to three-way at around [17/19] for
> t0410 while applying on that base, but it wasn't anything "am -3"
> couldn't handle.
>
> Queued.

Great, thanks. Sorry again for forgetting to mention it sooner.

Thanks,
Taylor
Taylor Blau June 7, 2024, 8:31 p.m. UTC | #5
On Fri, Jun 07, 2024 at 10:55:43AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Part three doesn't exist yet, but is straightforward to do on top. None
> > of the design decisions made in this series inhibit my goals for part
> > three.
>
> Nice to always see the bigger picture to come to understand where
> the current series fits, but the above is a bit peculiar thing to
> say.  Of course there should be no design decision the currently
> posted series makes that would block your future work---otherwise
> you would not be posting it.i

Yeah. What I was trying to say was that part two actually exists, and
works in practice rather than just thinking that it would work without
having actually demonstrated anything ;-).

> The real question is rather the future and yet to be written work is
> still feasible after the design decisions the current series made are
> found to be broken and need to be revised (if it happens---but we do
> not know until we see reviews).

Indeed. I'll make sure that before I push out a new round that the
rebased part two still works as I expect it to.

Certainly all of this could be avoided by combining the two together,
but I think the result is just too large to review.

Thanks,
Taylor
Junio C Hamano June 25, 2024, 11:21 p.m. UTC | #6
Taylor Blau <me@ttaylorr.com> writes:

> This series implements incremental MIDXs, which allow for storing
> a MIDX across multiple layers, each with their own distinct set of
> packs.

So, ...  it is unfortunate that this hasn't seen any responses (not
even a question, let alone a proper review) and almost 3 weeks have
passed.

Any takers?

Thanks.
Elijah Newren June 26, 2024, 12:44 a.m. UTC | #7
On Tue, Jun 25, 2024 at 4:21 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Taylor Blau <me@ttaylorr.com> writes:
>
> > This series implements incremental MIDXs, which allow for storing
> > a MIDX across multiple layers, each with their own distinct set of
> > packs.
>
> So, ...  it is unfortunate that this hasn't seen any responses (not
> even a question, let alone a proper review) and almost 3 weeks have
> passed.
>
> Any takers?
>
> Thanks.

I've got it on my list, and I'll try to look at it soon.  It'll take a
bit longer since I'm not familiar with the area.