mbox series

[v3,00/19] midx: incremental multi-pack indexes, part one

Message ID cover.1722958595.git.me@ttaylorr.com (mailing list archive)
Headers show
Series midx: incremental multi-pack indexes, part one | expand

Message

Taylor Blau Aug. 6, 2024, 3:36 p.m. UTC
This series implements incremental MIDXs, which allow for storing
a MIDX across multiple layers, each with their own distinct set of
packs.

This round is also similar to the previous one, but is rebased on
current 'master' (406f326d27 (The second batch, 2024-08-01)) and has
been updated in response to review from Peff on the previous round.

As usual, a range-diff is below, but the main changes since last time
are as follows:

  - Documentation improvements to clarify what happens when both an
    incremental- and non-incremental MIDX are both present in a
    repository.

  - Commit message typofix on 3/19 to fix an error in one of the
    technical examples.

  - Dropped a custom 'local_pack_int_id' in 4/19 to make the remaining
    diff easier to read.

  - Minor bugfix in 7/19 where we incorrectly terminated the object
    abbreviation disambiguation step for incremental MIDXs.

  - Various additional bits of information in the commit message to
    explain anything that was subtle.

Thanks in advance for any review! :-)

Taylor Blau (19):
  Documentation: describe incremental MIDX format
  midx: add new fields for incremental MIDX chains
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: implement verification support for incremental MIDXs
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  midx: implement support for writing incremental MIDX chains

 Documentation/git-multi-pack-index.txt       |  11 +-
 Documentation/technical/multi-pack-index.txt | 103 +++++
 builtin/multi-pack-index.c                   |   2 +
 builtin/repack.c                             |   8 +-
 ci/run-build-and-tests.sh                    |   2 +-
 midx-write.c                                 | 326 ++++++++++++---
 midx.c                                       | 405 ++++++++++++++++---
 midx.h                                       |  26 +-
 object-name.c                                |  99 ++---
 packfile.c                                   |  21 +-
 packfile.h                                   |   4 +
 t/README                                     |   6 +-
 t/helper/test-read-midx.c                    |  24 +-
 t/lib-bitmap.sh                              |   6 +-
 t/lib-midx.sh                                |  28 ++
 t/t0410-partial-clone.sh                     |   2 -
 t/t5310-pack-bitmaps.sh                      |   4 -
 t/t5313-pack-bounds-checks.sh                |   8 +-
 t/t5319-multi-pack-index.sh                  |  30 +-
 t/t5326-multi-pack-bitmaps.sh                |   4 +-
 t/t5327-multi-pack-bitmaps-rev.sh            |   6 +-
 t/t5332-multi-pack-reuse.sh                  |   2 +
 t/t5334-incremental-multi-pack-index.sh      |  46 +++
 t/t7700-repack.sh                            |  48 +--
 24 files changed, 960 insertions(+), 261 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh

Range-diff against v2:
 1:  014588b3ec !  1:  90b21b11ed Documentation: describe incremental MIDX format
    @@ Documentation/technical/multi-pack-index.txt: Design Details
     +extending the incremental MIDX format to support reachability bitmaps.
     +The design below specifically takes this into account, and support for
     +reachability bitmaps will be added in a future patch series. It is
    -+omitted from this series for the same reason as above.
    ++omitted from the current implementation for the same reason as above.
     ++
     +In brief, to support reachability bitmaps with the incremental MIDX
     +feature, the concept of the pseudo-pack order is extended across each
    @@ Documentation/technical/multi-pack-index.txt: Design Details
     +multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
     +the second layer of the chain, and so on.
     +
    ++When both an incremental- and non-incremental MIDX are present, the
    ++non-incremental MIDX is always read first.
    ++
     +=== Object positions for incremental MIDXs
     +
     +In the original multi-pack-index design, we refer to objects via their
 2:  337ebc6de7 =  2:  0d3b19c59f midx: add new fields for incremental MIDX chains
 3:  f449a72877 !  3:  5cd742b677 midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
    @@ Commit message
           objects contained in all layers of the incremental MIDX chain, not any
           particular layer. For example, consider MIDX chain with two individual
           MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
    -      4 objects appears earlier in the chain, then asking for pack "6" would
    +      4 objects appears earlier in the chain, then asking for object 6 would
           return the second object in the MIDX with 3 objects.
     
         [^2]: Building on the previous example, asking for object 6 in a MIDX
 4:  f88569c819 !  4:  372104c73d midx: teach `prepare_midx_pack()` about incremental MIDXs
    @@ midx.c: static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t p
      		die(_("bad pack-int-id: %u (%u total packs)"),
     -		    pack_int_id, m->num_packs);
     +		    pack_int_id, m->num_packs + m->num_packs_in_base);
    - 
    --	if (m->packs[pack_int_id])
    ++
     +	*_m = m;
     +
     +	return pack_int_id - m->num_packs_in_base;
    @@ midx.c: static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t p
     +{
     +	struct strbuf pack_name = STRBUF_INIT;
     +	struct packed_git *p;
    -+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
     +
    -+	if (m->packs[local_pack_int_id])
    ++	pack_int_id = midx_for_pack(&m, pack_int_id);
    + 
    + 	if (m->packs[pack_int_id])
      		return 0;
    - 
    - 	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
    --		    m->pack_names[pack_int_id]);
    -+		    m->pack_names[local_pack_int_id]);
    - 
    - 	p = add_packed_git(pack_name.buf, pack_name.len, m->local);
    - 	strbuf_release(&pack_name);
    -@@ midx.c: int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
    - 		return 1;
    - 
    - 	p->multi_pack_index = 1;
    --	m->packs[pack_int_id] = p;
    -+	m->packs[local_pack_int_id] = p;
    - 	install_packed_git(r, p);
    - 	list_add_tail(&p->mru, &r->objects->packed_git_mru);
    - 
 5:  ec57ff4349 =  5:  e68a3ceff9 midx: teach `nth_midxed_object_oid()` about incremental MIDXs
 6:  650b8c8c21 !  6:  ff2d7bc5ca midx: teach `nth_bitmapped_pack()` about incremental MIDXs
    @@ Commit message
         ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
         when accessing the data within that chunk.
     
    +    (Note that the both the call to prepare_midx_pack() and the assignment
    +    of bp->pack_int_id both care about the global pack_int_id, so avoid
    +    shadowing the given 'pack_int_id' parameter).
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx.c ##
 7:  bfd1dadbf1 !  7:  32c3fceada midx: introduce `bsearch_one_midx()`
    @@ object-name.c: static int match_hash(unsigned len, const unsigned char *a, const
      
     -	if (!num)
     -		return;
    -+		num = m->num_objects + m->num_objects_in_base;
    ++		if (!m->num_objects)
    ++			continue;
      
     -	bsearch_midx(&ds->bin_pfx, m, &first);
    -+		if (!num)
    -+			continue;
    ++		num = m->num_objects + m->num_objects_in_base;
      
     -	/*
     -	 * At this point, "first" is the location of the lowest object
 8:  38bd45bd24 =  8:  16db6c98ce midx: teach `bsearch_midx()` about incremental MIDXs
 9:  342ed56033 =  9:  761c7c59ba midx: teach `nth_midxed_offset()` about incremental MIDXs
10:  2b335c45ae = 10:  8366456d29 midx: teach `fill_midx_entry()` about incremental MIDXs
11:  22de5898f3 = 11:  909d927c47 midx: remove unused `midx_locate_pack()`
12:  fb60f2b022 = 12:  71127601b5 midx: teach `midx_contains_pack()` about incremental MIDXs
13:  38b642d404 = 13:  2f98ebb141 midx: teach `midx_preferred_pack()` about incremental MIDXs
14:  594386da10 ! 14:  550ae2dc93 midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
    @@ Commit message
             MIDX layers when dealing with an incremental MIDX chain by calling
             itself when given a MIDX with a non-NULL `base_midx`.
     
    +    Note that after 0c5a62f14b (midx-write.c: do not read existing MIDX with
    +    `packs_to_include`, 2024-06-11), we do not use this function with an
    +    existing MIDX (incremental or not) when generating a MIDX with
    +    --stdin-packs, and likewise for incremental MIDXs.
    +
    +    But it is still used when adding the fanout table from an incremental
    +    MIDX when generating a non-incremental MIDX (without --stdin-packs, of
    +    course).
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx-write.c ##
15:  dad130799c ! 15:  9ae1bc415e midx: support reading incremental MIDX chains
    @@ Commit message
         in the commit after next.)
     
         The core of this change involves following the order specified in the
    -    MIDX chain and opening up MIDXs in the chain one-by-one, adding them to
    -    the previous layer's `->base_midx` pointer at each step.
    +    MIDX chain in reverse and opening up MIDXs in the chain one-by-one,
    +    adding them to the previous layer's `->base_midx` pointer at each step.
     
         In order to implement this, the `load_multi_pack_index()` function is
         taught to call a new `load_multi_pack_index_chain()` function if loading
16:  ad976ef413 = 16:  3d4181df51 midx: implement verification support for incremental MIDXs
17:  23912425bf = 17:  3b268f91bf t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
18:  814da1916d = 18:  09d74f8942 t/t5313-pack-bounds-checks.sh: prepare for sub-directories
19:  e2b5961b45 = 19:  5d467d38a8 midx: implement support for writing incremental MIDX chains

base-commit: 406f326d271e0bacecdb00425422c5fa3f314930

Comments

Jeff King Aug. 12, 2024, 2:27 p.m. UTC | #1
On Tue, Aug 06, 2024 at 11:36:36AM -0400, Taylor Blau wrote:

> As usual, a range-diff is below, but the main changes since last time
> are as follows:
> 
>   - Documentation improvements to clarify what happens when both an
>     incremental- and non-incremental MIDX are both present in a
>     repository.
> 
>   - Commit message typofix on 3/19 to fix an error in one of the
>     technical examples.
> 
>   - Dropped a custom 'local_pack_int_id' in 4/19 to make the remaining
>     diff easier to read.
> 
>   - Minor bugfix in 7/19 where we incorrectly terminated the object
>     abbreviation disambiguation step for incremental MIDXs.
> 
>   - Various additional bits of information in the commit message to
>     explain anything that was subtle.

This all looks good to me.

I read over your responses to my previous review, but as you answered
all of my questions I didn't respond to each individually. :)

I looked over the changes in this iteration and they address all the
small points I brought up. In the bigger picture, I do think there are
probably still issues lurking around the global/local pack and objection
position ids. But where you have the series now seems like a good
cut-off point:

  - I suspect pack_pos_to_midx() might need to be adjusted. But it's not
    an issue that can be triggered until until we support incremental
    midx bitmaps. And that should definitely go in its own series (and
    preparing is kind of pointless because we don't know what the
    correct interface will be until mid-way through that topic).

  - I won't be surprised if there is some global/local bug that shakes
    out in the long run. But I don't have a clever way of preventing it
    or avoiding the need to deal with the distinction. So I think the
    best path is forward, and to let the shaking commence.

    The most important thing to that the new on-disk files are sane,
    since those are hard to walk back later. A simple bug in the code
    can always be fixed. Likewise, the changes here are unlikely to
    create a bug for anybody using a single unchained midx. So even if
    there is a bug, it will only be triggerable for people using the
    experimental mode.

-Peff