mbox series

[v2,0/7] pack-revindex: enable on-disk reverse indexes by default

Message ID cover.1681338013.git.me@ttaylorr.com (mailing list archive)
Headers show
Series pack-revindex: enable on-disk reverse indexes by default | expand

Message

Taylor Blau April 12, 2023, 10:20 p.m. UTC
Here is a (tiny) reroll of my series to enable pack reverse-indexes to
be written to disk by default instead of computed on-the-fly in memory.

The original cover letter[1] and commits herein have all of the gore-y
details. Not much has changed since last time, except:

  - squashing in a patch from Stolee to propagate a `struct repository
    *` a little further out from `load_pack_revindex()`

  - a tweak to the linux-TEST-vars CI job to continue running it with
    GIT_TEST_NO_WRITE_REV_INDEX

  - and an additional benchmark to demonstrate the performance of `git
    cat-file --batch-check='%(objectsize:disk)' --batch--all-objects
    --unordered` with and without on-disk reverse indexes

As always, a range-diff is included below for convenience. Thanks in
advance for taking (another) look!

[1]: https://lore.kernel.org/git/cover.1681166596.git.me@ttaylorr.com/

Taylor Blau (7):
  pack-write.c: plug a leak in stage_tmp_packfiles()
  t5325: mark as leak-free
  pack-revindex: make `load_pack_revindex` take a repository
  pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
  pack-revindex: introduce `pack.readReverseIndex`
  config: enable `pack.writeReverseIndex` by default
  t: invert `GIT_TEST_WRITE_REV_INDEX`

 Documentation/config/pack.txt     |  8 +++++++-
 builtin/index-pack.c              |  5 +++--
 builtin/pack-objects.c            |  5 +++--
 ci/run-build-and-tests.sh         |  2 +-
 pack-bitmap.c                     | 23 +++++++++++++----------
 pack-revindex.c                   | 12 +++++++++---
 pack-revindex.h                   |  6 ++++--
 pack-write.c                      |  2 ++
 packfile.c                        |  2 +-
 repo-settings.c                   |  1 +
 repository.h                      |  1 +
 t/README                          |  2 +-
 t/perf/p5312-pack-bitmaps-revs.sh |  3 +--
 t/t5325-reverse-index.sh          | 16 +++++++++++++++-
 14 files changed, 62 insertions(+), 26 deletions(-)

Range-diff against v1:
-:  ----------- > 1:  18be29c3988 pack-write.c: plug a leak in stage_tmp_packfiles()
-:  ----------- > 2:  affb5e2574b t5325: mark as leak-free
1:  be4faf11011 ! 3:  687a9a58924 pack-revindex: make `load_pack_revindex` take a repository
    @@ Commit message
         to `load_pack_revindex`, and update all callers to pass the correct
         instance (in all cases, `the_repository`).
     
    -    Signed-off-by: Taylor Blauy <me@ttaylorr.com>
    +    In certain instances, a new function-local variable is introduced to
    +    take the place of a `struct repository *` argument to the function
    +    itself to avoid propagating the new parameter even further throughout
    +    the tree.
    +
    +    Co-authored-by: Derrick Stolee <derrickstolee@github.com>
    +    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    +    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## pack-bitmap.c ##
    +@@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
    + 	return 0;
    + }
    + 
    +-static int load_reverse_index(struct bitmap_index *bitmap_git)
    ++static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
    + {
    + 	if (bitmap_is_midx(bitmap_git)) {
    + 		uint32_t i;
     @@ pack-bitmap.c: static int load_reverse_index(struct bitmap_index *bitmap_git)
      		 * since we will need to make use of them in pack-objects.
      		 */
      		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
     -			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
    -+			ret = load_pack_revindex(the_repository,
    -+						 bitmap_git->midx->packs[i]);
    ++			ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
      			if (ret)
      				return ret;
      		}
      		return 0;
      	}
     -	return load_pack_revindex(bitmap_git->pack);
    -+	return load_pack_revindex(the_repository, bitmap_git->pack);
    ++	return load_pack_revindex(r, bitmap_git->pack);
      }
      
    - static int load_bitmap(struct bitmap_index *bitmap_git)
    +-static int load_bitmap(struct bitmap_index *bitmap_git)
    ++static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
    + {
    + 	assert(bitmap_git->map);
    + 
    + 	bitmap_git->bitmaps = kh_init_oid_map();
    + 	bitmap_git->ext_index.positions = kh_init_oid_pos();
    + 
    +-	if (load_reverse_index(bitmap_git))
    ++	if (load_reverse_index(r, bitmap_git))
    + 		goto failed;
    + 
    + 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
    +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
    + {
    + 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    + 
    +-	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
    ++	if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
    + 		return bitmap_git;
    + 
    + 	free_bitmap_index(bitmap_git);
    +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
    + 
    + struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
    + {
    ++	struct repository *r = the_repository;
    + 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    + 
    +-	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git))
    ++	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
    + 		return bitmap_git;
    + 
    + 	free_bitmap_index(bitmap_git);
    +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
    + 	 * from disk. this is the point of no return; after this the rev_list
    + 	 * becomes invalidated and we must perform the revwalk through bitmaps
    + 	 */
    +-	if (load_bitmap(bitmap_git) < 0)
    ++	if (load_bitmap(revs->repo, bitmap_git) < 0)
    + 		goto cleanup;
    + 
    + 	object_array_clear(&revs->pending);
    +@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
    + 				       uint32_t *entries,
    + 				       struct bitmap **reuse_out)
    + {
    ++	struct repository *r = the_repository;
    + 	struct packed_git *pack;
    + 	struct bitmap *result = bitmap_git->result;
    + 	struct bitmap *reuse;
    +@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
    + 
    + 	assert(result);
    + 
    +-	load_reverse_index(bitmap_git);
    ++	load_reverse_index(r, bitmap_git);
    + 
    + 	if (bitmap_is_midx(bitmap_git))
    + 		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
    +@@ pack-bitmap.c: int rebuild_bitmap(const uint32_t *reposition,
    + uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
    + 				struct packing_data *mapping)
    + {
    ++	struct repository *r = the_repository;
    + 	uint32_t i, num_objects;
    + 	uint32_t *reposition;
    + 
    + 	if (!bitmap_is_midx(bitmap_git))
    +-		load_reverse_index(bitmap_git);
    ++		load_reverse_index(r, bitmap_git);
    + 	else if (load_midx_revindex(bitmap_git->midx) < 0)
    + 		BUG("rebuild_existing_bitmaps: missing required rev-cache "
    + 		    "extension");
     
      ## pack-revindex.c ##
     @@ pack-revindex.c: static int load_pack_revindex_from_disk(struct packed_git *p)
2:  0f368e2347e = 4:  8eec5bacd3a pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
3:  6f692d470cb ! 5:  a62fc3e4ec1 pack-revindex: introduce `pack.readReverseIndex`
    @@ Commit message
               'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran
                 2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
     
    +    Luckily, the results when running `git cat-file` with `--unordered` are
    +    closer together:
    +
    +        $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
    +        Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
    +          Time (mean ± σ):      5.066 s ±  0.105 s    [User: 4.792 s, System: 0.274 s]
    +          Range (min … max):    4.943 s …  5.220 s    10 runs
    +
    +        Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
    +          Time (mean ± σ):      6.193 s ±  0.069 s    [User: 5.937 s, System: 0.255 s]
    +          Range (min … max):    6.145 s …  6.356 s    10 runs
    +
    +        Summary
    +          'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran
    +            1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
    +
         Because the equilibrium point between these two is highly machine- and
         repository-dependent, allow users to configure whether or not they will
         read any ".rev" file(s) with this configuration knob.
4:  56a0fc0098e = 6:  f8298fb0bac config: enable `pack.writeReverseIndex` by default
5:  9c803799588 ! 7:  edff6a80c63 t: invert `GIT_TEST_WRITE_REV_INDEX`
    @@ Commit message
         disable writing ".rev" files, thereby running the test suite in a mode
         where the reverse index is generated from scratch.
     
    -    This ensures that we are still running and exercising Git's behavior
    -    when forced to generate reverse indexes from scratch.
    +    This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
    +    spelling of "true", we are still running and exercising Git's behavior
    +    when forced to generate reverse indexes from scratch. Do so by setting
    +    it in the linux-TEST-vars CI run to ensure that we are maintaining good
    +    coverage of this now-legacy code.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ ci/run-build-and-tests.sh: linux-TEST-vars)
      	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
      	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
     -	export GIT_TEST_WRITE_REV_INDEX=1
    ++	export GIT_TEST_NO_WRITE_REV_INDEX=1
      	export GIT_TEST_CHECKOUT_WORKERS=2
      	;;
      linux-clang)

Comments

Derrick Stolee April 13, 2023, 1:40 p.m. UTC | #1
On 4/12/2023 6:20 PM, Taylor Blau wrote:
> Here is a (tiny) reroll of my series to enable pack reverse-indexes to
> be written to disk by default instead of computed on-the-fly in memory.
> 
> The original cover letter[1] and commits herein have all of the gore-y
> details. Not much has changed since last time, except:
> 
>   - squashing in a patch from Stolee to propagate a `struct repository
>     *` a little further out from `load_pack_revindex()`
> 
>   - a tweak to the linux-TEST-vars CI job to continue running it with
>     GIT_TEST_NO_WRITE_REV_INDEX
> 
>   - and an additional benchmark to demonstrate the performance of `git
>     cat-file --batch-check='%(objectsize:disk)' --batch--all-objects
>     --unordered` with and without on-disk reverse indexes

Thanks for addressing my concerns from v1. This version LGTM.

-Stolee