Message ID | be8f47e13c612f2fbe4d5f4f49794529b9424664.1631049462.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-bitmap: permute existing namehash values | expand |
On Tue, Sep 07 2021, Taylor Blau wrote: > +static int git_multi_pack_index_write_config(const char *var, const char *value, > + void *cb) > +{ > + if (!strcmp(var, "pack.writebitmaphashcache")) { > + if (git_config_bool(var, value)) > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > + else > + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; > + } > + > + /* > + * No need to fall-back to 'git_default_config', since this was already > + * called in 'cmd_multi_pack_index()'. > + */ > + return 0; > +} > + > static int cmd_multi_pack_index_write(int argc, const char **argv) > { > struct option *options; > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) > OPT_END(), > }; > > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > + > + git_config(git_multi_pack_index_write_config, NULL); > + Since this is a write-only config option it would seem more logical to just call git_config() once, and have a git_multip_pack_index_config, which then would fall back on git_default_config, so we iterate it once, and no need for a comment about the oddity.
On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Sep 07 2021, Taylor Blau wrote: > > > +static int git_multi_pack_index_write_config(const char *var, const char *value, > > + void *cb) > > +{ > > + if (!strcmp(var, "pack.writebitmaphashcache")) { > > + if (git_config_bool(var, value)) > > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > > + else > > + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; > > + } > > + > > + /* > > + * No need to fall-back to 'git_default_config', since this was already > > + * called in 'cmd_multi_pack_index()'. > > + */ > > + return 0; > > +} > > + > > static int cmd_multi_pack_index_write(int argc, const char **argv) > > { > > struct option *options; > > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) > > OPT_END(), > > }; > > > > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > > + > > + git_config(git_multi_pack_index_write_config, NULL); > > + > > Since this is a write-only config option it would seem more logical to > just call git_config() once, and have a git_multip_pack_index_config, > which then would fall back on git_default_config, so we iterate it once, > and no need for a comment about the oddity. Perhaps, but I'm not crazy about each sub-command having to call git_config() itself when 'write' is the only one that actually has any values to read. FWIW, the commit-graph builtin does the same thing as is written here (calling git_config() twice, once in cmd_commit_graph() with git_default_config as the callback and again in cmd_commit_graph_write() with git_commit_graph_write_config as the callback). So I'm not opposed to cleaning it up, but I'd rather be consistent with the existing behavior. To be honest, I'm not at all convinced that reading the config twice is a bottleneck here when compared to generating a MIDX. Thanks, Taylor
On Tue, Sep 07 2021, Taylor Blau wrote: > On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Tue, Sep 07 2021, Taylor Blau wrote: >> >> > +static int git_multi_pack_index_write_config(const char *var, const char *value, >> > + void *cb) >> > +{ >> > + if (!strcmp(var, "pack.writebitmaphashcache")) { >> > + if (git_config_bool(var, value)) >> > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; >> > + else >> > + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; >> > + } >> > + >> > + /* >> > + * No need to fall-back to 'git_default_config', since this was already >> > + * called in 'cmd_multi_pack_index()'. >> > + */ >> > + return 0; >> > +} >> > + >> > static int cmd_multi_pack_index_write(int argc, const char **argv) >> > { >> > struct option *options; >> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) >> > OPT_END(), >> > }; >> > >> > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; >> > + >> > + git_config(git_multi_pack_index_write_config, NULL); >> > + >> >> Since this is a write-only config option it would seem more logical to >> just call git_config() once, and have a git_multip_pack_index_config, >> which then would fall back on git_default_config, so we iterate it once, >> and no need for a comment about the oddity. > > Perhaps, but I'm not crazy about each sub-command having to call > git_config() itself when 'write' is the only one that actually has any > values to read. > > FWIW, the commit-graph builtin does the same thing as is written here > (calling git_config() twice, once in cmd_commit_graph() with > git_default_config as the callback and again in cmd_commit_graph_write() > with git_commit_graph_write_config as the callback). I didn't notice your earlier d356d5debe5 (commit-graph: introduce 'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in that commit seems to be broken or not testing that code change at all, if I comment out the git_config(git_commit_graph_write_config, &opts) it'll pass. As a comment on this series I'd find 4/4 squashed into 3/4 easier to read, when I did a "git blame" and found d356d5debe5 I discovered the test right away, if and when this gets merged someone might do the same, but not find the test as easily (they'd probably then grep the config variable name and find it eventually...). More importantly, the same issue with the commit-graph test seems to be the case here, if I comment out the added config reading code it'll still pass, it seems to be testing something, but not that the config is being read. > So I'm not opposed to cleaning it up, but I'd rather be consistent with > the existing behavior. To be honest, I'm not at all convinced that > reading the config twice is a bottleneck here when compared to > generating a MIDX. It's never going to matter at all for performance, I should have been clearer with my comments. I meant them purely as a "this code is hard to follow" comment. I.e. since we read the config twice, and in both commit-graph.c and multi-pack-index.c munge and write to the "opts" struct on parse_options(), you'll need to follow logic like: 1. Read config in cmd_X(), might set variable xyz 2. Do parse_options() in cmd_X(), might set variable xyz also 3. Now in cmd_X_subcmd(), read config, might set variable xyz 4. Do parse_options() in cmd_X(), migh set variable xyz also Of course in this case the relevant opts.flags only matters for the "write" subcommand, so on more careful reading we don't need to worry about the value flip-flopping between config defaults and getopts settings, but just in terms of establishing a pattern we'll be following in the subcommand built-ins I think this is setting us up for more complexity than is needed. As far as being consistent with existing behavior, in git-worktree, git-stash which are both similarly structured subcommands we follow the pattern of calling git_config() once, it seems to me better to follow that pattern than the one in d356d5debe5 if the config can be unambiguously parsed in one pass.
On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote: > On Tue, Sep 07 2021, Taylor Blau wrote: > >> On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote: >>> >>> On Tue, Sep 07 2021, Taylor Blau wrote: >>> >>> > +static int git_multi_pack_index_write_config(const char *var, const char *value, >>> > + void *cb) >>> > +{ >>> > + if (!strcmp(var, "pack.writebitmaphashcache")) { >>> > + if (git_config_bool(var, value)) >>> > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; >>> > + else >>> > + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; >>> > + } >>> > + >>> > + /* >>> > + * No need to fall-back to 'git_default_config', since this was already >>> > + * called in 'cmd_multi_pack_index()'. >>> > + */ >>> > + return 0; >>> > +} >>> > + >>> > static int cmd_multi_pack_index_write(int argc, const char **argv) >>> > { >>> > struct option *options; >>> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) >>> > OPT_END(), >>> > }; >>> > >>> > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; >>> > + >>> > + git_config(git_multi_pack_index_write_config, NULL); >>> > + >>> >>> Since this is a write-only config option it would seem more logical to >>> just call git_config() once, and have a git_multip_pack_index_config, >>> which then would fall back on git_default_config, so we iterate it once, >>> and no need for a comment about the oddity. >> >> Perhaps, but I'm not crazy about each sub-command having to call >> git_config() itself when 'write' is the only one that actually has any >> values to read. >> >> FWIW, the commit-graph builtin does the same thing as is written here >> (calling git_config() twice, once in cmd_commit_graph() with >> git_default_config as the callback and again in cmd_commit_graph_write() >> with git_commit_graph_write_config as the callback). > > I didn't notice your earlier d356d5debe5 (commit-graph: introduce > 'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in > that commit seems to be broken or not testing that code change at all, > if I comment out the git_config(git_commit_graph_write_config, &opts) > it'll pass. > > As a comment on this series I'd find 4/4 squashed into 3/4 easier to > read, when I did a "git blame" and found d356d5debe5 I discovered the > test right away, if and when this gets merged someone might do the same, > but not find the test as easily (they'd probably then grep the config > variable name and find it eventually...). > > More importantly, the same issue with the commit-graph test seems to be > the case here, if I comment out the added config reading code it'll > still pass, it seems to be testing something, but not that the config is > being read. > >> So I'm not opposed to cleaning it up, but I'd rather be consistent with >> the existing behavior. To be honest, I'm not at all convinced that >> reading the config twice is a bottleneck here when compared to >> generating a MIDX. > > It's never going to matter at all for performance, I should have been > clearer with my comments. I meant them purely as a "this code is hard to > follow" comment. > > I.e. since we read the config twice, and in both commit-graph.c and > multi-pack-index.c munge and write to the "opts" struct on > parse_options(), you'll need to follow logic like: > > 1. Read config in cmd_X(), might set variable xyz > 2. Do parse_options() in cmd_X(), might set variable xyz also > 3. Now in cmd_X_subcmd(), read config, might set variable xyz > 4. Do parse_options() in cmd_X(), migh set variable xyz also > > Of course in this case the relevant opts.flags only matters for the > "write" subcommand, so on more careful reading we don't need to worry > about the value flip-flopping between config defaults and getopts > settings, but just in terms of establishing a pattern we'll be following > in the subcommand built-ins I think this is setting us up for more > complexity than is needed. > > As far as being consistent with existing behavior, in git-worktree, > git-stash which are both similarly structured subcommands we follow the > pattern of calling git_config() once, it seems to me better to follow > that pattern than the one in d356d5debe5 if the config can be > unambiguously parsed in one pass. In similar spirit as my https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I started seeing if not doing the flags via getopt but instead variables & setting the flags later was better, and came up with this on top. Not for this series, more to muse on how we can write these subcommands in a simpler manner (or not). I may have discovered a subtle bug in the process, in cmd_multi_pack_index_repack() we end up calling write_midx_internal(), which cares about MIDX_WRITE_REV_INDEX, but only cmd_multi_pack_index_write() will set that flag, both before & after my patch. Are we using the wrong flags during repack as a result? diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index dd1652531bf..1b97b2ee4e1 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -45,14 +45,16 @@ static char const * const builtin_multi_pack_index_usage[] = { static struct opts_multi_pack_index { const char *object_dir; const char *preferred_pack; - unsigned long batch_size; - unsigned flags; -} opts; + int progress; + int write_bitmap_hash_cache; +} opts = { + .write_bitmap_hash_cache = -1, +}; static struct option common_opts[] = { OPT_FILENAME(0, "object-dir", &opts.object_dir, N_("object directory containing set of packfile and pack-index pairs")), - OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS), + OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), OPT_END(), }; @@ -61,38 +63,29 @@ static struct option *add_common_options(struct option *prev) return parse_options_concat(common_opts, prev); } -static int git_multi_pack_index_write_config(const char *var, const char *value, - void *cb) +static int git_multi_pack_index_config(const char *var, const char *value, + void *cb) { if (!strcmp(var, "pack.writebitmaphashcache")) { - if (git_config_bool(var, value)) - opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; - else - opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; + opts.write_bitmap_hash_cache = git_config_bool(var, value); + return 0; } - /* - * No need to fall-back to 'git_default_config', since this was already - * called in 'cmd_multi_pack_index()'. - */ - return 0; + return git_default_config(var, value, NULL); } static int cmd_multi_pack_index_write(int argc, const char **argv) { struct option *options; + static int write_bitmap = 0; static struct option builtin_multi_pack_index_write_options[] = { OPT_STRING(0, "preferred-pack", &opts.preferred_pack, N_("preferred-pack"), N_("pack for reuse when computing a multi-pack bitmap")), - OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"), - MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX), + OPT_BOOL(0, "bitmap", &write_bitmap, N_("write multi-pack bitmap")), OPT_END(), }; - - opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; - - git_config(git_multi_pack_index_write_config, NULL); + unsigned flags = 0; options = add_common_options(builtin_multi_pack_index_write_options); @@ -107,8 +100,15 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) FREE_AND_NULL(options); - return write_midx_file(opts.object_dir, opts.preferred_pack, - opts.flags); + if (opts.progress) + flags |= MIDX_PROGRESS; + /* Both -1 default and 1 via config */ + if (!opts.write_bitmap_hash_cache) + flags |= MIDX_WRITE_BITMAP_HASH_CACHE; + if (write_bitmap) + flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX; + + return write_midx_file(opts.object_dir, opts.preferred_pack, flags); } static int cmd_multi_pack_index_verify(int argc, const char **argv) @@ -124,7 +124,7 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv) usage_with_options(builtin_multi_pack_index_verify_usage, options); - return verify_midx_file(the_repository, opts.object_dir, opts.flags); + return verify_midx_file(the_repository, opts.object_dir, opts.progress); } static int cmd_multi_pack_index_expire(int argc, const char **argv) @@ -140,14 +140,15 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv) usage_with_options(builtin_multi_pack_index_expire_usage, options); - return expire_midx_packs(the_repository, opts.object_dir, opts.flags); + return expire_midx_packs(the_repository, opts.object_dir, opts.progress); } static int cmd_multi_pack_index_repack(int argc, const char **argv) { + static unsigned long batch_size = 0; struct option *options; static struct option builtin_multi_pack_index_repack_options[] = { - OPT_MAGNITUDE(0, "batch-size", &opts.batch_size, + OPT_MAGNITUDE(0, "batch-size", &batch_size, N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")), OPT_END(), }; @@ -167,7 +168,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv) FREE_AND_NULL(options); return midx_repack(the_repository, opts.object_dir, - (size_t)opts.batch_size, opts.flags); + (size_t)batch_size, + opts.progress ? MIDX_PROGRESS : 0); } int cmd_multi_pack_index(int argc, const char **argv, @@ -175,10 +177,10 @@ int cmd_multi_pack_index(int argc, const char **argv, { struct option *builtin_multi_pack_index_options = common_opts; - git_config(git_default_config, NULL); + git_config(git_multi_pack_index_config, NULL); if (isatty(2)) - opts.flags |= MIDX_PROGRESS; + opts.progress = 1; argc = parse_options(argc, argv, prefix, builtin_multi_pack_index_options, builtin_multi_pack_index_usage, diff --git a/midx.c b/midx.c index 6c35dcd557c..3e722888d69 100644 --- a/midx.c +++ b/midx.c @@ -1482,7 +1482,7 @@ static int compare_pair_pos_vs_id(const void *_a, const void *_b) display_progress(progress, _n); \ } while (0) -int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags) +int verify_midx_file(struct repository *r, const char *object_dir, int opt_progress) { struct pair_pos_vs_id *pairs = NULL; uint32_t i; @@ -1505,7 +1505,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag if (!midx_checksum_valid(m)) midx_report(_("incorrect checksum")); - if (flags & MIDX_PROGRESS) + if (opt_progress) progress = start_delayed_progress(_("Looking for referenced packfiles"), m->num_packs); for (i = 0; i < m->num_packs; i++) { @@ -1534,7 +1534,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag return verify_midx_error; } - if (flags & MIDX_PROGRESS) + if (opt_progress) progress = start_sparse_progress(_("Verifying OID order in multi-pack-index"), m->num_objects - 1); for (i = 0; i < m->num_objects - 1; i++) { @@ -1563,14 +1563,14 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i); } - if (flags & MIDX_PROGRESS) + if (opt_progress) progress = start_sparse_progress(_("Sorting objects by packfile"), m->num_objects); display_progress(progress, 0); /* TODO: Measure QSORT() progress */ QSORT(pairs, m->num_objects, compare_pair_pos_vs_id); stop_progress(&progress); - if (flags & MIDX_PROGRESS) + if (opt_progress) progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects); for (i = 0; i < m->num_objects; i++) { struct object_id oid; diff --git a/midx.h b/midx.h index 541d9ac728d..0dfe6a54ef3 100644 --- a/midx.h +++ b/midx.h @@ -64,7 +64,7 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags); void clear_midx_file(struct repository *r); -int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags); +int verify_midx_file(struct repository *r, const char *object_dir, int opt_progress); int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags); int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags);
On Thu, Sep 09, 2021 at 10:18:10AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Sep 07 2021, Taylor Blau wrote: > > > On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > >> On Tue, Sep 07 2021, Taylor Blau wrote: > >> > >> > +static int git_multi_pack_index_write_config(const char *var, const char *value, > >> > + void *cb) > >> > +{ > >> > + if (!strcmp(var, "pack.writebitmaphashcache")) { > >> > + if (git_config_bool(var, value)) > >> > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > >> > + else > >> > + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; > >> > + } > >> > + > >> > + /* > >> > + * No need to fall-back to 'git_default_config', since this was already > >> > + * called in 'cmd_multi_pack_index()'. > >> > + */ > >> > + return 0; > >> > +} > >> > + > >> > static int cmd_multi_pack_index_write(int argc, const char **argv) > >> > { > >> > struct option *options; > >> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) > >> > OPT_END(), > >> > }; > >> > > >> > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > >> > + > >> > + git_config(git_multi_pack_index_write_config, NULL); > >> > + > >> > >> Since this is a write-only config option it would seem more logical to > >> just call git_config() once, and have a git_multip_pack_index_config, > >> which then would fall back on git_default_config, so we iterate it once, > >> and no need for a comment about the oddity. > > > > Perhaps, but I'm not crazy about each sub-command having to call > > git_config() itself when 'write' is the only one that actually has any > > values to read. > > > > FWIW, the commit-graph builtin does the same thing as is written here > > (calling git_config() twice, once in cmd_commit_graph() with > > git_default_config as the callback and again in cmd_commit_graph_write() > > with git_commit_graph_write_config as the callback). > > I didn't notice your earlier d356d5debe5 (commit-graph: introduce > 'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in > that commit seems to be broken or not testing that code change at all, > if I comment out the git_config(git_commit_graph_write_config, &opts) > it'll pass. That makes sense; the test that d356d5debe5 added is ensuring that the option `--max-new-filters` overrides any configured value of `commitGraph.maxNewFilters` (so not reading the configuration would be fine there). > More importantly, the same issue with the commit-graph test seems to be > the case here, if I comment out the added config reading code it'll > still pass, it seems to be testing something, but not that the config is > being read. I think this also makes sense; since MIDX_WRITE_BITMAP_HASH_CACHE is the default and is set in cmd_multi_pack_index_write(). So it may be worth adding a test to say "make sure the hash-cache _isn't_ written when I: git config pack.writeBitmapHashCache && git multi-pack-index write --bitmap But I don't feel strongly about it (hence I didn't write such a test in the original version which I sent here). If you think it would be helpful in a newer version, I'm happy to add it. Thanks, Taylor
On Thu, Sep 09, 2021 at 11:34:16AM +0200, Ævar Arnfjörð Bjarmason wrote: > In similar spirit as my > https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I > started seeing if not doing the flags via getopt but instead variables & > setting the flags later was better, and came up with this on top. Not > for this series, more to muse on how we can write these subcommands in a > simpler manner (or not). Sure, I think that everything you wrote here is correct. I don't have a strong opinion, really, but one benefit of not manipulating a single 'flags' int is that we don't have to do stuff like: if (git_config_bool(var, value)) opts.flags |= FLAG_FOO; else opts.flags &= ~FLAG_FOO; and instead can write `opts.foo = git_config_bool(var, value)`. Of course, the trade-off is that you later have to turn `opts.foo` into flags at some point (or pass each option as an individual parameter). So nothing's free, and I see it as a toss-up between which is easier to read and write. > I may have discovered a subtle bug in the process, in > cmd_multi_pack_index_repack() we end up calling write_midx_internal(), > which cares about MIDX_WRITE_REV_INDEX, but only > cmd_multi_pack_index_write() will set that flag, both before & after my > patch. Are we using the wrong flags during repack as a result? Only the `write` sub-command would ever want to set that flag, since we don't support writing a bitmap after `expire`. So that part seems right, but perhaps there is a another problem you're seeing? Thanks, Taylor
On Thu, Sep 09 2021, Taylor Blau wrote: > On Thu, Sep 09, 2021 at 11:34:16AM +0200, Ævar Arnfjörð Bjarmason wrote: >> In similar spirit as my >> https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I >> started seeing if not doing the flags via getopt but instead variables & >> setting the flags later was better, and came up with this on top. Not >> for this series, more to muse on how we can write these subcommands in a >> simpler manner (or not). > > Sure, I think that everything you wrote here is correct. I don't have a > strong opinion, really, but one benefit of not manipulating a single > 'flags' int is that we don't have to do stuff like: > > if (git_config_bool(var, value)) > opts.flags |= FLAG_FOO; > else > opts.flags &= ~FLAG_FOO; > > and instead can write `opts.foo = git_config_bool(var, value)`. *nod* > Of course, the trade-off is that you later have to turn `opts.foo` into > flags at some point (or pass each option as an individual parameter). So > nothing's free, and I see it as a toss-up between which is easier to > read and write. I think that trade-off is usually a benefit, also in the pack-write.c etc. case, i.e. you enforce a clear boundary between the built-in and the underlying API, and don't have to e.g. wonder if some write flag is handled by verify() (which will just care about the progress flag), as you pass some moral equivalent of a "struct all_the_options_for_all_the_things" around between all of them. I realize trying to solve that problem from a different angle may be how you ended up going for per-subcommand config reading.... >> I may have discovered a subtle bug in the process, in >> cmd_multi_pack_index_repack() we end up calling write_midx_internal(), >> which cares about MIDX_WRITE_REV_INDEX, but only >> cmd_multi_pack_index_write() will set that flag, both before & after my >> patch. Are we using the wrong flags during repack as a result? > > Only the `write` sub-command would ever want to set that flag, since we > don't support writing a bitmap after `expire`. So that part seems right, > but perhaps there is a another problem you're seeing? In midx_repack() we'll call write_midx_internal(). That function gets the "flags" we pass to midx_repack() and will check MIDX_WRITE_REV_INDEX. I haven't checked whether we actually reach that, but that's what I was wondering, i.e. whether the repack routine would "write" when repacking, and we missed the flag option there.
On Thu, Sep 09, 2021 at 05:50:47PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> I may have discovered a subtle bug in the process, in > >> cmd_multi_pack_index_repack() we end up calling write_midx_internal(), > >> which cares about MIDX_WRITE_REV_INDEX, but only > >> cmd_multi_pack_index_write() will set that flag, both before & after my > >> patch. Are we using the wrong flags during repack as a result? > > > > Only the `write` sub-command would ever want to set that flag, since we > > don't support writing a bitmap after `expire`. So that part seems right, > > but perhaps there is a another problem you're seeing? > > In midx_repack() we'll call write_midx_internal(). That function gets > the "flags" we pass to midx_repack() and will check > MIDX_WRITE_REV_INDEX. I haven't checked whether we actually reach that, > but that's what I was wondering, i.e. whether the repack routine would > "write" when repacking, and we missed the flag option there. I don't think it's a problem in practice. We would never have MIDX_WRITE_REV_INDEX set when executing cmd_multi_pack_index_repack(), (and the same is true for all other subcommands besides `write`) because the default value for flags is just MIDX_PROGRESS (if isatty(2)), and we only add the WRITE_REV_INDEX bit from within the write handler. More generally, that is to say "we only support writing a bitmap from the `write` sub-command". There is no reason that we couldn't lift this limitation and support writing a bitmap on the resulting MIDX after `expire` or `repack` we just haven't done so. But I don't see any problems with not getting the right flags, etc. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > +static int git_multi_pack_index_write_config(const char *var, const char *value, > + void *cb) > +{ > + if (!strcmp(var, "pack.writebitmaphashcache")) { > + if (git_config_bool(var, value)) > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > + else > + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; > + } > + > + /* > + * No need to fall-back to 'git_default_config', since this was already > + * called in 'cmd_multi_pack_index()'. > + */ It's probably not just "No need to", but calling default_config() or any "more generic" config this late is a wrong pattern, as the one that was already called in the caller may have been overridden by the command line options parser, and calling "more generic" config callback after that would be a no-no. So I think this should read "we should never make a fall-back call to ...", instead. Granted, cmd_multi_pack_index() only uses default_config(), and what it sets will not be overridden by the command line argument parser of the multi-pack-index command, so it is not a problem yet, but if we ever introduce configuration variables that are applicable to multiple subcommands of multi-pack-index, this distinction may start to matter. Unless a Git subcommand cannot work with just a single call to git_config() with a callback upfront, I actually think it may be better for them to issue a more pointed git_config_get_*() on the variables they are interested in, never making a separate call to git_config(). But that can be left for the future.
On Sun, Sep 12, 2021 at 05:38:36PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > +static int git_multi_pack_index_write_config(const char *var, const char *value, > > + void *cb) > > +{ > > + if (!strcmp(var, "pack.writebitmaphashcache")) { > > + if (git_config_bool(var, value)) > > + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; > > + else > > + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; > > + } > > + > > + /* > > + * No need to fall-back to 'git_default_config', since this was already > > + * called in 'cmd_multi_pack_index()'. > > + */ > > It's probably not just "No need to", but calling default_config() or > any "more generic" config this late is a wrong pattern [...] Makes sense to me, will apply. Thanks! Thanks, Taylor
diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index 763f7af7c4..ad7f73a1ea 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -159,6 +159,10 @@ pack.writeBitmapHashCache:: between an older, bitmapped pack and objects that have been pushed since the last gc). The downside is that it consumes 4 bytes per object of disk space. Defaults to true. ++ +When writing a multi-pack reachability bitmap, no new namehashes are +computed; instead, any namehashes stored in an existing bitmap are +permuted into their appropriate location when writing a new bitmap. pack.writeReverseIndex:: When true, git will write a corresponding .rev file (see: diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 73c0113b48..dd1652531b 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -61,6 +61,23 @@ static struct option *add_common_options(struct option *prev) return parse_options_concat(common_opts, prev); } +static int git_multi_pack_index_write_config(const char *var, const char *value, + void *cb) +{ + if (!strcmp(var, "pack.writebitmaphashcache")) { + if (git_config_bool(var, value)) + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; + else + opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE; + } + + /* + * No need to fall-back to 'git_default_config', since this was already + * called in 'cmd_multi_pack_index()'. + */ + return 0; +} + static int cmd_multi_pack_index_write(int argc, const char **argv) { struct option *options; @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) OPT_END(), }; + opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE; + + git_config(git_multi_pack_index_write_config, NULL); + options = add_common_options(builtin_multi_pack_index_write_options); trace2_cmd_mode(argv[0]); diff --git a/midx.c b/midx.c index ccdc3e5702..6c35dcd557 100644 --- a/midx.c +++ b/midx.c @@ -1008,9 +1008,13 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, struct pack_idx_entry **index; struct commit **commits = NULL; uint32_t i, commits_nr; + uint16_t options = 0; char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash)); int ret; + if (flags & MIDX_WRITE_BITMAP_HASH_CACHE) + options |= BITMAP_OPT_HASH_CACHE; + prepare_midx_packing_data(&pdata, ctx); commits = find_commits_for_midx_bitmap(&commits_nr, ctx); @@ -1049,7 +1053,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, goto cleanup; bitmap_writer_set_checksum(midx_hash); - bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0); + bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options); cleanup: free(index); diff --git a/midx.h b/midx.h index aa3da557bb..541d9ac728 100644 --- a/midx.h +++ b/midx.h @@ -44,6 +44,7 @@ struct multi_pack_index { #define MIDX_PROGRESS (1 << 0) #define MIDX_WRITE_REV_INDEX (1 << 1) #define MIDX_WRITE_BITMAP (1 << 2) +#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3) const unsigned char *get_midx_checksum(struct multi_pack_index *m); char *get_midx_filename(const char *object_dir);
In the previous commit, the bitmap writing code learned to propagate values from an existing hash-cache extension into the bitmap that it is writing. Now that that functionality exists, let's expose it by teaching the 'git multi-pack-index' builtin to respect the `pack.writeBitmapHashCache` option so that the hash-cache may be written at all. Two minor points worth noting here: - The 'git multi-pack-index write' sub-command didn't previously read any configuration (instead this is handled in the base command). A separate handler is added here to respect this write-specific config option. - I briefly considered adding a 'bitmap_flags' field to the static options struct, but decided against it since it would require plumbing through a new parameter to the write_midx_file() function. Instead, a new MIDX-specific flag is added, which is translated to the corresponding bitmap one. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config/pack.txt | 4 ++++ builtin/multi-pack-index.c | 21 +++++++++++++++++++++ midx.c | 6 +++++- midx.h | 1 + 4 files changed, 31 insertions(+), 1 deletion(-)