Message ID | b7cfb1267fdd7f50f414c9f79377cb338a0c1ab0.1744924321.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | pack-bitmap: enable lookup tables by default, misc. cleanups | expand |
Taylor Blau <me@ttaylorr.com> writes: > diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c > index 2a938466f5..6ad6f814e3 100644 > --- a/builtin/multi-pack-index.c > +++ b/builtin/multi-pack-index.c > @@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, > int ret; > > opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > + opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE; > > git_config(git_multi_pack_index_write_config, NULL); > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 3973267e9e..384fefbb1d 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -239,7 +239,7 @@ static enum { > WRITE_BITMAP_QUIET, > WRITE_BITMAP_TRUE, > } write_bitmap_index; > -static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE; > +static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE; Are these two hunks required to be kept in sync? If so, I am wondering what is the right approach to make sure they are. The definition of MIDX_WRITE_BITMAP_* flag bits in midx.h and BITMAP_OPT_* flag bits in write_bitmap_index enum are distinct two sets, and we need a way to somehow convert between them back and forth if we really wanted to ensure that these "default" values are kept in sync automatically. The reason I ask is mostly because I do not know offhand, and I would imagine that it would be hard for third-parties to verify, if these are only two places that need to be updated in order for lookup table extensions to be written by default, when somebody new wants to further update the default.
On Thu, Apr 17, 2025 at 03:04:00PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c > > index 2a938466f5..6ad6f814e3 100644 > > --- a/builtin/multi-pack-index.c > > +++ b/builtin/multi-pack-index.c > > @@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, > > int ret; > > > > opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > > + opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE; > > > > git_config(git_multi_pack_index_write_config, NULL); > > > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > > index 3973267e9e..384fefbb1d 100644 > > --- a/builtin/pack-objects.c > > +++ b/builtin/pack-objects.c > > @@ -239,7 +239,7 @@ static enum { > > WRITE_BITMAP_QUIET, > > WRITE_BITMAP_TRUE, > > } write_bitmap_index; > > -static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE; > > +static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE; > > Are these two hunks required to be kept in sync? They're not technically required to be in sync. It is OK for the midx bitmaps to have different options than the ones we make for packs. And in theory they could intentionally diverge, though in practice I don't think we (yet) have any extensions or options that would be more appropriate for one or the other. So if we did want to join them, I think it would make sense to still be able to use different flags for each situation, but initialize them from a common definition. > If so, I am wondering what is the right approach to make sure they > are. The definition of MIDX_WRITE_BITMAP_* flag bits in midx.h and > BITMAP_OPT_* flag bits in write_bitmap_index enum are distinct two > sets, and we need a way to somehow convert between them back and > forth if we really wanted to ensure that these "default" values are > kept in sync automatically. Yeah, I think that would be much simpler if the bitmap options were held separately from the other midx flags, and then we could use the same values for each (and a common "defaults" definition). Which is conceptually simple, but means we have to plumb a separate variable through a bunch of intermediate functions. See the (untested) patch below. I dunno if it's worth it for something that doesn't really change all that often. OTOH, it does remove the extra layer of translation in write_midx_bitmap(). diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 2a938466f5..fb073a946a 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -10,6 +10,7 @@ #include "object-store-ll.h" #include "replace-object.h" #include "repository.h" +#include "pack-bitmap.h" #define BUILTIN_MIDX_WRITE_USAGE \ N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]" \ @@ -54,6 +55,7 @@ static struct opts_multi_pack_index { char *refs_snapshot; unsigned long batch_size; unsigned flags; + uint16_t bitmap_options; int stdin_packs; } opts; @@ -89,16 +91,16 @@ static int git_multi_pack_index_write_config(const char *var, const char *value, { if (!strcmp(var, "pack.writebitmaphashcache")) { if (git_config_bool(var, value)) - opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; + opts.bitmap_options |= BITMAP_OPT_HASH_CACHE; else - opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; + opts.bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(var, "pack.writebitmaplookuptable")) { if (git_config_bool(var, value)) - opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE; + opts.bitmap_options |= BITMAP_OPT_LOOKUP_TABLE; else - opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE; + opts.bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE; } /* @@ -141,7 +143,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, }; int ret; - opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; + opts.bitmap_options |= BITMAP_OPT_HASH_CACHE; git_config(git_multi_pack_index_write_config, NULL); @@ -167,7 +169,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, ret = write_midx_file_only(repo, opts.object_dir, &packs, opts.preferred_pack, - opts.refs_snapshot, opts.flags); + opts.refs_snapshot, opts.flags, + opts.bitmap_options); string_list_clear(&packs, 0); free(opts.refs_snapshot); @@ -177,7 +180,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, } ret = write_midx_file(repo, opts.object_dir, opts.preferred_pack, - opts.refs_snapshot, opts.flags); + opts.refs_snapshot, opts.flags, + opts.bitmap_options); free(opts.refs_snapshot); return ret; @@ -236,7 +240,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv, FREE_AND_NULL(options); - return expire_midx_packs(the_repository, opts.object_dir, opts.flags); + return expire_midx_packs(the_repository, opts.object_dir, opts.flags, + opts.bitmap_options); } static int cmd_multi_pack_index_repack(int argc, const char **argv, @@ -269,7 +274,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv, FREE_AND_NULL(options); return midx_repack(the_repository, opts.object_dir, - (size_t)opts.batch_size, opts.flags); + (size_t)opts.batch_size, opts.flags, + opts.bitmap_options); } int cmd_multi_pack_index(int argc, diff --git a/builtin/repack.c b/builtin/repack.c index f3330ade7b..69c9a8bc4c 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -1562,7 +1562,7 @@ int cmd_repack(int argc, if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL, 0)) flags |= MIDX_WRITE_INCREMENTAL; write_midx_file(the_repository, repo_get_object_directory(the_repository), - NULL, NULL, flags); + NULL, NULL, flags, 0); } cleanup: diff --git a/midx-write.c b/midx-write.c index 48a4dc5e94..5729b989cb 100644 --- a/midx-write.c +++ b/midx-write.c @@ -841,10 +841,10 @@ static int write_midx_bitmap(struct write_midx_context *ctx, struct packing_data *pdata, struct commit **commits, uint32_t commits_nr, - unsigned flags) + unsigned flags, + uint16_t options) { int ret, i; - uint16_t options = 0; struct bitmap_writer writer; struct pack_idx_entry **index; struct strbuf bitmap_name = STRBUF_INIT; @@ -859,12 +859,6 @@ static int write_midx_bitmap(struct write_midx_context *ctx, get_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name, object_dir, midx_hash, MIDX_EXT_BITMAP); - if (flags & MIDX_WRITE_BITMAP_HASH_CACHE) - options |= BITMAP_OPT_HASH_CACHE; - - if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE) - options |= BITMAP_OPT_LOOKUP_TABLE; - /* * Build the MIDX-order index based on pdata.objects (which is already * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of @@ -1070,7 +1064,8 @@ static int write_midx_internal(struct repository *r, const char *object_dir, struct string_list *packs_to_drop, const char *preferred_pack_name, const char *refs_snapshot, - unsigned flags) + unsigned flags, + uint16_t bitmap_options) { struct strbuf midx_name = STRBUF_INIT; unsigned char midx_hash[GIT_MAX_RAWSZ]; @@ -1433,7 +1428,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir, if (write_midx_bitmap(&ctx, object_dir, midx_hash, &pdata, commits, commits_nr, - flags) < 0) { + flags, bitmap_options) < 0) { error(_("could not write multi-pack bitmap")); result = 1; clear_packing_data(&pdata); @@ -1529,23 +1524,27 @@ static int write_midx_internal(struct repository *r, const char *object_dir, int write_midx_file(struct repository *r, const char *object_dir, const char *preferred_pack_name, - const char *refs_snapshot, unsigned flags) + const char *refs_snapshot, unsigned flags, + uint16_t bitmap_options) { return write_midx_internal(r, object_dir, NULL, NULL, preferred_pack_name, refs_snapshot, - flags); + flags, bitmap_options); } int write_midx_file_only(struct repository *r, const char *object_dir, struct string_list *packs_to_include, const char *preferred_pack_name, - const char *refs_snapshot, unsigned flags) + const char *refs_snapshot, unsigned flags, + uint16_t bitmap_options) { return write_midx_internal(r, object_dir, packs_to_include, NULL, - preferred_pack_name, refs_snapshot, flags); + preferred_pack_name, refs_snapshot, flags, + bitmap_options); } -int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags) +int expire_midx_packs(struct repository *r, const char *object_dir, + unsigned flags, uint16_t bitmap_options) { uint32_t i, *count, result = 0; struct string_list packs_to_drop = STRING_LIST_INIT_DUP; @@ -1603,7 +1602,8 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla if (packs_to_drop.nr) result = write_midx_internal(r, object_dir, NULL, - &packs_to_drop, NULL, NULL, flags); + &packs_to_drop, NULL, NULL, flags, + bitmap_options); string_list_clear(&packs_to_drop, 0); @@ -1718,7 +1718,8 @@ static void fill_included_packs_batch(struct repository *r, free(pack_info); } -int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags) +int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, + unsigned flags, uint16_t bitmap_options) { int result = 0; uint32_t i, packs_to_repack = 0; @@ -1801,7 +1802,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, } result = write_midx_internal(r, object_dir, NULL, NULL, NULL, NULL, - flags); + flags, bitmap_options); cleanup: free(include_pack); diff --git a/midx.h b/midx.h index 9d1374cbd5..5a79302a7a 100644 --- a/midx.h +++ b/midx.h @@ -81,8 +81,6 @@ struct multi_pack_index { #define MIDX_PROGRESS (1 << 0) #define MIDX_WRITE_REV_INDEX (1 << 1) #define MIDX_WRITE_BITMAP (1 << 2) -#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3) -#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4) #define MIDX_WRITE_INCREMENTAL (1 << 5) #define MIDX_EXT_REV "rev" @@ -131,15 +129,16 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i */ int write_midx_file(struct repository *r, const char *object_dir, const char *preferred_pack_name, const char *refs_snapshot, - unsigned flags); + unsigned flags, uint16_t bitmap_options); int write_midx_file_only(struct repository *r, const char *object_dir, struct string_list *packs_to_include, const char *preferred_pack_name, - const char *refs_snapshot, unsigned flags); + const char *refs_snapshot, unsigned flags, + uint16_t bitmap_options); void clear_midx_file(struct repository *r); int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags); -int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags); -int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags); +int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags, uint16_t bitmap_options); +int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags, uint16_t bitmap_options); void close_midx(struct multi_pack_index *m);
Jeff King <peff@peff.net> writes: > They're not technically required to be in sync. It is OK for the midx > bitmaps to have different options than the ones we make for packs. And > in theory they could intentionally diverge, though in practice I don't > think we (yet) have any extensions or options that would be more > appropriate for one or the other. > > So if we did want to join them, I think it would make sense to still be > able to use different flags for each situation, but initialize them from > a common definition. Thanks for great explanation---I guess it is not worth pursuing, then. It is not like it would make the system misbehave when two are set differently.
On Fri, Apr 18, 2025 at 08:44:29AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > They're not technically required to be in sync. It is OK for the midx > > bitmaps to have different options than the ones we make for packs. And > > in theory they could intentionally diverge, though in practice I don't > > think we (yet) have any extensions or options that would be more > > appropriate for one or the other. > > > > So if we did want to join them, I think it would make sense to still be > > able to use different flags for each situation, but initialize them from > > a common definition. > > Thanks for great explanation---I guess it is not worth pursuing, > then. It is not like it would make the system misbehave when two > are set differently. Hah, Peff beat me to it. I saw your reply last night and was going to write you a very similar response. I think the summary from my perspective would be that: the two could fall out-of-sync intentionally if we want the two commands to ever have different defaults. Tangentially we could use some common "bitmap_flags" field whose bits are defined in pack-bitmap.h and used in both places. The latter is a bit awkward currently because the current "flags" that we pass into the MIDX machinery from the builtin all have MIDX-specific meanings. So we would have to either make sure that MIDX uses separate bit positions (which is awful and far too fragile for my comfort/taste) or store them as a separate set of flags (I think what Peff is getting at above). Thanks, Taylor
diff --git a/Documentation/config/pack.adoc b/Documentation/config/pack.adoc index da527377fa..ba538e3d9c 100644 --- a/Documentation/config/pack.adoc +++ b/Documentation/config/pack.adoc @@ -191,7 +191,7 @@ pack.writeBitmapLookupTable:: bitmap index (if one is written). This table is used to defer loading individual bitmaps as late as possible. This can be beneficial in repositories that have relatively large bitmap - indexes. Defaults to false. + indexes. Defaults to true. pack.readReverseIndex:: When true, git will read any .rev file(s) that may be available diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 2a938466f5..6ad6f814e3 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, int ret; opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; + opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE; git_config(git_multi_pack_index_write_config, NULL); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 3973267e9e..384fefbb1d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -239,7 +239,7 @@ static enum { WRITE_BITMAP_QUIET, WRITE_BITMAP_TRUE, } write_bitmap_index; -static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE; +static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE; static int exclude_promisor_objects; static int exclude_promisor_objects_best_effort;
The lookup table extension for reachability bitmaps was first introduced via 3fe0121479 (Merge branch 'ac/bitmap-lookup-table', 2022-09-05). For each bitmapped commit, the lookup table encodes three unsigned integers: - The pack position of the commit. - An offset within the *.bitmap itself where that commit's bitmap lives. - An index into the same table where information on that commit's XOR pair can be found. Lookup tables make bitmap operations faster because they no longer have to read the entire *.bitmap file to discover what commits have corresponding reachability bitmaps. When bitmap lookup tables were first introduced, we established a baseline level of performance in p5310 with and without lookup tables. Here is the baseline without: Test this tree ----------------------------------------------------------------------- 5310.6: simulated clone 14.04(5.78+1.79) 5310.7: simulated fetch 1.95(3.05+0.20) 5310.8: pack to file (bitmap) 44.73(20.55+7.45) 5310.9: rev-list (commits) 0.78(0.46+0.10) 5310.10: rev-list (objects) 4.07(3.97+0.08) 5310.11: rev-list with tag negated via --not 0.06(0.02+0.03) --all (objects) 5310.12: rev-list with negative tag (objects) 0.21(0.15+0.05) 5310.13: rev-list count with blob:none 0.24(0.17+0.06) 5310.14: rev-list count with blob:limit=1k 7.07(5.92+0.48) 5310.15: rev-list count with tree:0 0.25(0.17+0.07) 5310.16: simulated partial clone 5.67(3.28+0.64) 5310.18: clone (partial bitmap) 16.05(8.34+1.86) 5310.19: pack to file (partial bitmap) 59.76(27.22+7.43) 5310.20: rev-list with tree filter (partial bitmap) 0.90(0.18+0.16) , and here is the same set of tests, this time with the lookup table enabled: Test this tree ----------------------------------------------------------------------- 5310.26: simulated clone 13.69(5.72+1.78) 5310.27: simulated fetch 1.84(3.02+0.16) 5310.28: pack to file (bitmap) 45.63(20.67+7.50) 5310.29: rev-list (commits) 0.56(0.39+0.8) 5310.30: rev-list (objects) 3.77(3.74+0.08) 5310.31: rev-list with tag negated via --not 0.05(0.02+0.03) --all (objects) 5310.32: rev-list with negative tag (objects) 0.21(0.15+0.05) 5310.33: rev-list count with blob:none 0.23(0.17+0.05) 5310.34: rev-list count with blob:limit=1k 6.65(5.72+0.40) 5310.35: rev-list count with tree:0 0.23(0.16+0.06) 5310.36: simulated partial clone 5.57(3.26+0.59) 5310.38: clone (partial bitmap) 15.89(8.39+1.84) 5310.39: pack to file (partial bitmap) 58.32(27.55+7.47) 5310.40: rev-list with tree filter (partial bitmap) 0.73(0.18+0.15) (All numbers here come from a ~2022-era copy of the kernel, via Abhradeep Chakraborty who implemented the lookup table extension). In the almost three years since lookup tables were introduced, GitHub has used them in production without issue, taking advantage of the above performance benefits along the way. Since this feature has had sufficient time to flush out any bugs and/or performance regressions, let's enable it by default so that all bitmap users can reap similar performance benefits. [1]: https://lore.kernel.org/git/pull.1266.git.1655728395.gitgitgadget@gmail.com/ Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config/pack.adoc | 2 +- builtin/multi-pack-index.c | 1 + builtin/pack-objects.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-)