diff mbox series

[3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps

Message ID be8f47e13c612f2fbe4d5f4f49794529b9424664.1631049462.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: permute existing namehash values | expand

Commit Message

Taylor Blau Sept. 7, 2021, 9:18 p.m. UTC
In the previous commit, the bitmap writing code learned to propagate
values from an existing hash-cache extension into the bitmap that it is
writing.

Now that that functionality exists, let's expose it by teaching the 'git
multi-pack-index' builtin to respect the `pack.writeBitmapHashCache`
option so that the hash-cache may be written at all.

Two minor points worth noting here:

  - The 'git multi-pack-index write' sub-command didn't previously read
    any configuration (instead this is handled in the base command). A
    separate handler is added here to respect this write-specific
    config option.

  - I briefly considered adding a 'bitmap_flags' field to the static
    options struct, but decided against it since it would require
    plumbing through a new parameter to the write_midx_file() function.

    Instead, a new MIDX-specific flag is added, which is translated to
    the corresponding bitmap one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.txt |  4 ++++
 builtin/multi-pack-index.c    | 21 +++++++++++++++++++++
 midx.c                        |  6 +++++-
 midx.h                        |  1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Sept. 8, 2021, 1:40 a.m. UTC | #1
On Tue, Sep 07 2021, Taylor Blau wrote:

> +static int git_multi_pack_index_write_config(const char *var, const char *value,
> +					     void *cb)
> +{
> +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> +		if (git_config_bool(var, value))
> +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +		else
> +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> +	}
> +
> +	/*
> +	 * No need to fall-back to 'git_default_config', since this was already
> +	 * called in 'cmd_multi_pack_index()'.
> +	 */
> +	return 0;
> +}
> +
>  static int cmd_multi_pack_index_write(int argc, const char **argv)
>  {
>  	struct option *options;
> @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  		OPT_END(),
>  	};
>  
> +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +
> +	git_config(git_multi_pack_index_write_config, NULL);
> +

Since this is a write-only config option it would seem more logical to
just call git_config() once, and have a git_multip_pack_index_config,
which then would fall back on git_default_config, so we iterate it once,
and no need for a comment about the oddity.
Taylor Blau Sept. 8, 2021, 2:28 a.m. UTC | #2
On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
> > +					     void *cb)
> > +{
> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> > +		if (git_config_bool(var, value))
> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> > +		else
> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> > +	}
> > +
> > +	/*
> > +	 * No need to fall-back to 'git_default_config', since this was already
> > +	 * called in 'cmd_multi_pack_index()'.
> > +	 */
> > +	return 0;
> > +}
> > +
> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  {
> >  	struct option *options;
> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  		OPT_END(),
> >  	};
> >
> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> > +
> > +	git_config(git_multi_pack_index_write_config, NULL);
> > +
>
> Since this is a write-only config option it would seem more logical to
> just call git_config() once, and have a git_multip_pack_index_config,
> which then would fall back on git_default_config, so we iterate it once,
> and no need for a comment about the oddity.

Perhaps, but I'm not crazy about each sub-command having to call
git_config() itself when 'write' is the only one that actually has any
values to read.

FWIW, the commit-graph builtin does the same thing as is written here
(calling git_config() twice, once in cmd_commit_graph() with
git_default_config as the callback and again in cmd_commit_graph_write()
with git_commit_graph_write_config as the callback).

So I'm not opposed to cleaning it up, but I'd rather be consistent with
the existing behavior. To be honest, I'm not at all convinced that
reading the config twice is a bottleneck here when compared to
generating a MIDX.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason Sept. 9, 2021, 8:18 a.m. UTC | #3
On Tue, Sep 07 2021, Taylor Blau wrote:

> On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Tue, Sep 07 2021, Taylor Blau wrote:
>>
>> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
>> > +					     void *cb)
>> > +{
>> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
>> > +		if (git_config_bool(var, value))
>> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>> > +		else
>> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
>> > +	}
>> > +
>> > +	/*
>> > +	 * No need to fall-back to 'git_default_config', since this was already
>> > +	 * called in 'cmd_multi_pack_index()'.
>> > +	 */
>> > +	return 0;
>> > +}
>> > +
>> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
>> >  {
>> >  	struct option *options;
>> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>> >  		OPT_END(),
>> >  	};
>> >
>> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>> > +
>> > +	git_config(git_multi_pack_index_write_config, NULL);
>> > +
>>
>> Since this is a write-only config option it would seem more logical to
>> just call git_config() once, and have a git_multip_pack_index_config,
>> which then would fall back on git_default_config, so we iterate it once,
>> and no need for a comment about the oddity.
>
> Perhaps, but I'm not crazy about each sub-command having to call
> git_config() itself when 'write' is the only one that actually has any
> values to read.
>
> FWIW, the commit-graph builtin does the same thing as is written here
> (calling git_config() twice, once in cmd_commit_graph() with
> git_default_config as the callback and again in cmd_commit_graph_write()
> with git_commit_graph_write_config as the callback).

I didn't notice your earlier d356d5debe5 (commit-graph: introduce
'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in
that commit seems to be broken or not testing that code change at all,
if I comment out the git_config(git_commit_graph_write_config, &opts)
it'll pass.

As a comment on this series I'd find 4/4 squashed into 3/4 easier to
read, when I did a "git blame" and found d356d5debe5 I discovered the
test right away, if and when this gets merged someone might do the same,
but not find the test as easily (they'd probably then grep the config
variable name and find it eventually...).

More importantly, the same issue with the commit-graph test seems to be
the case here, if I comment out the added config reading code it'll
still pass, it seems to be testing something, but not that the config is
being read.

> So I'm not opposed to cleaning it up, but I'd rather be consistent with
> the existing behavior. To be honest, I'm not at all convinced that
> reading the config twice is a bottleneck here when compared to
> generating a MIDX.

It's never going to matter at all for performance, I should have been
clearer with my comments. I meant them purely as a "this code is hard to
follow" comment.

I.e. since we read the config twice, and in both commit-graph.c and
multi-pack-index.c munge and write to the "opts" struct on
parse_options(), you'll need to follow logic like:

    1. Read config in cmd_X(), might set variable xyz
    2. Do parse_options() in cmd_X(), might set variable xyz also
    3. Now in cmd_X_subcmd(), read config, might set variable xyz
    4. Do parse_options() in cmd_X(), migh set variable xyz also

Of course in this case the relevant opts.flags only matters for the
"write" subcommand, so on more careful reading we don't need to worry
about the value flip-flopping between config defaults and getopts
settings, but just in terms of establishing a pattern we'll be following
in the subcommand built-ins I think this is setting us up for more
complexity than is needed.

As far as being consistent with existing behavior, in git-worktree,
git-stash which are both similarly structured subcommands we follow the
pattern of calling git_config() once, it seems to me better to follow
that pattern than the one in d356d5debe5 if the config can be
unambiguously parsed in one pass.
Ævar Arnfjörð Bjarmason Sept. 9, 2021, 9:34 a.m. UTC | #4
On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Sep 07 2021, Taylor Blau wrote:
>
>> On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Tue, Sep 07 2021, Taylor Blau wrote:
>>>
>>> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
>>> > +					     void *cb)
>>> > +{
>>> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
>>> > +		if (git_config_bool(var, value))
>>> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>>> > +		else
>>> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
>>> > +	}
>>> > +
>>> > +	/*
>>> > +	 * No need to fall-back to 'git_default_config', since this was already
>>> > +	 * called in 'cmd_multi_pack_index()'.
>>> > +	 */
>>> > +	return 0;
>>> > +}
>>> > +
>>> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
>>> >  {
>>> >  	struct option *options;
>>> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>>> >  		OPT_END(),
>>> >  	};
>>> >
>>> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>>> > +
>>> > +	git_config(git_multi_pack_index_write_config, NULL);
>>> > +
>>>
>>> Since this is a write-only config option it would seem more logical to
>>> just call git_config() once, and have a git_multip_pack_index_config,
>>> which then would fall back on git_default_config, so we iterate it once,
>>> and no need for a comment about the oddity.
>>
>> Perhaps, but I'm not crazy about each sub-command having to call
>> git_config() itself when 'write' is the only one that actually has any
>> values to read.
>>
>> FWIW, the commit-graph builtin does the same thing as is written here
>> (calling git_config() twice, once in cmd_commit_graph() with
>> git_default_config as the callback and again in cmd_commit_graph_write()
>> with git_commit_graph_write_config as the callback).
>
> I didn't notice your earlier d356d5debe5 (commit-graph: introduce
> 'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in
> that commit seems to be broken or not testing that code change at all,
> if I comment out the git_config(git_commit_graph_write_config, &opts)
> it'll pass.
>
> As a comment on this series I'd find 4/4 squashed into 3/4 easier to
> read, when I did a "git blame" and found d356d5debe5 I discovered the
> test right away, if and when this gets merged someone might do the same,
> but not find the test as easily (they'd probably then grep the config
> variable name and find it eventually...).
>
> More importantly, the same issue with the commit-graph test seems to be
> the case here, if I comment out the added config reading code it'll
> still pass, it seems to be testing something, but not that the config is
> being read.
>
>> So I'm not opposed to cleaning it up, but I'd rather be consistent with
>> the existing behavior. To be honest, I'm not at all convinced that
>> reading the config twice is a bottleneck here when compared to
>> generating a MIDX.
>
> It's never going to matter at all for performance, I should have been
> clearer with my comments. I meant them purely as a "this code is hard to
> follow" comment.
>
> I.e. since we read the config twice, and in both commit-graph.c and
> multi-pack-index.c munge and write to the "opts" struct on
> parse_options(), you'll need to follow logic like:
>
>     1. Read config in cmd_X(), might set variable xyz
>     2. Do parse_options() in cmd_X(), might set variable xyz also
>     3. Now in cmd_X_subcmd(), read config, might set variable xyz
>     4. Do parse_options() in cmd_X(), migh set variable xyz also
>
> Of course in this case the relevant opts.flags only matters for the
> "write" subcommand, so on more careful reading we don't need to worry
> about the value flip-flopping between config defaults and getopts
> settings, but just in terms of establishing a pattern we'll be following
> in the subcommand built-ins I think this is setting us up for more
> complexity than is needed.
>
> As far as being consistent with existing behavior, in git-worktree,
> git-stash which are both similarly structured subcommands we follow the
> pattern of calling git_config() once, it seems to me better to follow
> that pattern than the one in d356d5debe5 if the config can be
> unambiguously parsed in one pass.

In similar spirit as my
https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I
started seeing if not doing the flags via getopt but instead variables &
setting the flags later was better, and came up with this on top. Not
for this series, more to muse on how we can write these subcommands in a
simpler manner (or not).

I may have discovered a subtle bug in the process, in
cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
which cares about MIDX_WRITE_REV_INDEX, but only
cmd_multi_pack_index_write() will set that flag, both before & after my
patch. Are we using the wrong flags during repack as a result?

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index dd1652531bf..1b97b2ee4e1 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -45,14 +45,16 @@ static char const * const builtin_multi_pack_index_usage[] = {
 static struct opts_multi_pack_index {
 	const char *object_dir;
 	const char *preferred_pack;
-	unsigned long batch_size;
-	unsigned flags;
-} opts;
+	int progress;
+	int write_bitmap_hash_cache;
+} opts = {
+	.write_bitmap_hash_cache = -1,
+};
 
 static struct option common_opts[] = {
 	OPT_FILENAME(0, "object-dir", &opts.object_dir,
 	  N_("object directory containing set of packfile and pack-index pairs")),
-	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
 	OPT_END(),
 };
 
@@ -61,38 +63,29 @@ static struct option *add_common_options(struct option *prev)
 	return parse_options_concat(common_opts, prev);
 }
 
-static int git_multi_pack_index_write_config(const char *var, const char *value,
-					     void *cb)
+static int git_multi_pack_index_config(const char *var, const char *value,
+				       void *cb)
 {
 	if (!strcmp(var, "pack.writebitmaphashcache")) {
-		if (git_config_bool(var, value))
-			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
-		else
-			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
+		opts.write_bitmap_hash_cache = git_config_bool(var, value);
+		return 0;
 	}
 
-	/*
-	 * No need to fall-back to 'git_default_config', since this was already
-	 * called in 'cmd_multi_pack_index()'.
-	 */
-	return 0;
+	return git_default_config(var, value, NULL);
 }
 
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options;
+	static int write_bitmap = 0;
 	static struct option builtin_multi_pack_index_write_options[] = {
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
-		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
-			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
+		OPT_BOOL(0, "bitmap", &write_bitmap, N_("write multi-pack bitmap")),
 		OPT_END(),
 	};
-
-	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
-
-	git_config(git_multi_pack_index_write_config, NULL);
+	unsigned flags = 0;
 
 	options = add_common_options(builtin_multi_pack_index_write_options);
 
@@ -107,8 +100,15 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 
 	FREE_AND_NULL(options);
 
-	return write_midx_file(opts.object_dir, opts.preferred_pack,
-			       opts.flags);
+	if (opts.progress)
+		flags |= MIDX_PROGRESS;
+	/* Both -1 default and 1 via config */
+	if (!opts.write_bitmap_hash_cache)
+		flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	if (write_bitmap)
+		flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+
+	return write_midx_file(opts.object_dir, opts.preferred_pack, flags);
 }
 
 static int cmd_multi_pack_index_verify(int argc, const char **argv)
@@ -124,7 +124,7 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_verify_usage,
 				   options);
 
-	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
+	return verify_midx_file(the_repository, opts.object_dir, opts.progress);
 }
 
 static int cmd_multi_pack_index_expire(int argc, const char **argv)
@@ -140,14 +140,15 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_expire_usage,
 				   options);
 
-	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+	return expire_midx_packs(the_repository, opts.object_dir, opts.progress);
 }
 
 static int cmd_multi_pack_index_repack(int argc, const char **argv)
 {
+	static unsigned long batch_size = 0;
 	struct option *options;
 	static struct option builtin_multi_pack_index_repack_options[] = {
-		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
+		OPT_MAGNITUDE(0, "batch-size", &batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
 	};
@@ -167,7 +168,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 	FREE_AND_NULL(options);
 
 	return midx_repack(the_repository, opts.object_dir,
-			   (size_t)opts.batch_size, opts.flags);
+			   (size_t)batch_size,
+			   opts.progress ? MIDX_PROGRESS : 0);
 }
 
 int cmd_multi_pack_index(int argc, const char **argv,
@@ -175,10 +177,10 @@ int cmd_multi_pack_index(int argc, const char **argv,
 {
 	struct option *builtin_multi_pack_index_options = common_opts;
 
-	git_config(git_default_config, NULL);
+	git_config(git_multi_pack_index_config, NULL);
 
 	if (isatty(2))
-		opts.flags |= MIDX_PROGRESS;
+		opts.progress = 1;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
 			     builtin_multi_pack_index_usage,
diff --git a/midx.c b/midx.c
index 6c35dcd557c..3e722888d69 100644
--- a/midx.c
+++ b/midx.c
@@ -1482,7 +1482,7 @@ static int compare_pair_pos_vs_id(const void *_a, const void *_b)
 			display_progress(progress, _n); \
 	} while (0)
 
-int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags)
+int verify_midx_file(struct repository *r, const char *object_dir, int opt_progress)
 {
 	struct pair_pos_vs_id *pairs = NULL;
 	uint32_t i;
@@ -1505,7 +1505,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	if (!midx_checksum_valid(m))
 		midx_report(_("incorrect checksum"));
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_delayed_progress(_("Looking for referenced packfiles"),
 					  m->num_packs);
 	for (i = 0; i < m->num_packs; i++) {
@@ -1534,7 +1534,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 		return verify_midx_error;
 	}
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_sparse_progress(_("Verifying OID order in multi-pack-index"),
 						 m->num_objects - 1);
 	for (i = 0; i < m->num_objects - 1; i++) {
@@ -1563,14 +1563,14 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 		pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i);
 	}
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_sparse_progress(_("Sorting objects by packfile"),
 						 m->num_objects);
 	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
 	QSORT(pairs, m->num_objects, compare_pair_pos_vs_id);
 	stop_progress(&progress);
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects);
 	for (i = 0; i < m->num_objects; i++) {
 		struct object_id oid;
diff --git a/midx.h b/midx.h
index 541d9ac728d..0dfe6a54ef3 100644
--- a/midx.h
+++ b/midx.h
@@ -64,7 +64,7 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 
 int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
 void clear_midx_file(struct repository *r);
-int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
+int verify_midx_file(struct repository *r, const char *object_dir, int opt_progress);
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
 int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags);
Taylor Blau Sept. 9, 2021, 2:47 p.m. UTC | #5
On Thu, Sep 09, 2021 at 10:18:10AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Tue, Sep 07 2021, Taylor Blau wrote:
> >>
> >> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
> >> > +					     void *cb)
> >> > +{
> >> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> >> > +		if (git_config_bool(var, value))
> >> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> >> > +		else
> >> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> >> > +	}
> >> > +
> >> > +	/*
> >> > +	 * No need to fall-back to 'git_default_config', since this was already
> >> > +	 * called in 'cmd_multi_pack_index()'.
> >> > +	 */
> >> > +	return 0;
> >> > +}
> >> > +
> >> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
> >> >  {
> >> >  	struct option *options;
> >> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >> >  		OPT_END(),
> >> >  	};
> >> >
> >> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> >> > +
> >> > +	git_config(git_multi_pack_index_write_config, NULL);
> >> > +
> >>
> >> Since this is a write-only config option it would seem more logical to
> >> just call git_config() once, and have a git_multip_pack_index_config,
> >> which then would fall back on git_default_config, so we iterate it once,
> >> and no need for a comment about the oddity.
> >
> > Perhaps, but I'm not crazy about each sub-command having to call
> > git_config() itself when 'write' is the only one that actually has any
> > values to read.
> >
> > FWIW, the commit-graph builtin does the same thing as is written here
> > (calling git_config() twice, once in cmd_commit_graph() with
> > git_default_config as the callback and again in cmd_commit_graph_write()
> > with git_commit_graph_write_config as the callback).
>
> I didn't notice your earlier d356d5debe5 (commit-graph: introduce
> 'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in
> that commit seems to be broken or not testing that code change at all,
> if I comment out the git_config(git_commit_graph_write_config, &opts)
> it'll pass.

That makes sense; the test that d356d5debe5 added is ensuring that the
option `--max-new-filters` overrides any configured value of
`commitGraph.maxNewFilters` (so not reading the configuration would be
fine there).

> More importantly, the same issue with the commit-graph test seems to be
> the case here, if I comment out the added config reading code it'll
> still pass, it seems to be testing something, but not that the config is
> being read.

I think this also makes sense; since MIDX_WRITE_BITMAP_HASH_CACHE is the
default and is set in cmd_multi_pack_index_write(). So it may be worth
adding a test to say "make sure the hash-cache _isn't_ written when I:

    git config pack.writeBitmapHashCache &&
    git multi-pack-index write --bitmap

But I don't feel strongly about it (hence I didn't write such a test in
the original version which I sent here). If you think it would be
helpful in a newer version, I'm happy to add it.

Thanks,
Taylor
Taylor Blau Sept. 9, 2021, 2:55 p.m. UTC | #6
On Thu, Sep 09, 2021 at 11:34:16AM +0200, Ævar Arnfjörð Bjarmason wrote:
> In similar spirit as my
> https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I
> started seeing if not doing the flags via getopt but instead variables &
> setting the flags later was better, and came up with this on top. Not
> for this series, more to muse on how we can write these subcommands in a
> simpler manner (or not).

Sure, I think that everything you wrote here is correct. I don't have a
strong opinion, really, but one benefit of not manipulating a single
'flags' int is that we don't have to do stuff like:

  if (git_config_bool(var, value))
    opts.flags |= FLAG_FOO;
  else
    opts.flags &= ~FLAG_FOO;

and instead can write `opts.foo = git_config_bool(var, value)`.

Of course, the trade-off is that you later have to turn `opts.foo` into
flags at some point (or pass each option as an individual parameter). So
nothing's free, and I see it as a toss-up between which is easier to
read and write.

> I may have discovered a subtle bug in the process, in
> cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
> which cares about MIDX_WRITE_REV_INDEX, but only
> cmd_multi_pack_index_write() will set that flag, both before & after my
> patch. Are we using the wrong flags during repack as a result?

Only the `write` sub-command would ever want to set that flag, since we
don't support writing a bitmap after `expire`. So that part seems right,
but perhaps there is a another problem you're seeing?

Thanks,
Taylor
Ævar Arnfjörð Bjarmason Sept. 9, 2021, 3:50 p.m. UTC | #7
On Thu, Sep 09 2021, Taylor Blau wrote:

> On Thu, Sep 09, 2021 at 11:34:16AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> In similar spirit as my
>> https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I
>> started seeing if not doing the flags via getopt but instead variables &
>> setting the flags later was better, and came up with this on top. Not
>> for this series, more to muse on how we can write these subcommands in a
>> simpler manner (or not).
>
> Sure, I think that everything you wrote here is correct. I don't have a
> strong opinion, really, but one benefit of not manipulating a single
> 'flags' int is that we don't have to do stuff like:
>
>   if (git_config_bool(var, value))
>     opts.flags |= FLAG_FOO;
>   else
>     opts.flags &= ~FLAG_FOO;
>
> and instead can write `opts.foo = git_config_bool(var, value)`.

*nod*

> Of course, the trade-off is that you later have to turn `opts.foo` into
> flags at some point (or pass each option as an individual parameter). So
> nothing's free, and I see it as a toss-up between which is easier to
> read and write.

I think that trade-off is usually a benefit, also in the pack-write.c
etc. case, i.e. you enforce a clear boundary between the built-in and
the underlying API, and don't have to e.g. wonder if some write flag is
handled by verify() (which will just care about the progress flag), as
you pass some moral equivalent of a "struct
all_the_options_for_all_the_things" around between all of them.

I realize trying to solve that problem from a different angle may be how
you ended up going for per-subcommand config reading....

>> I may have discovered a subtle bug in the process, in
>> cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
>> which cares about MIDX_WRITE_REV_INDEX, but only
>> cmd_multi_pack_index_write() will set that flag, both before & after my
>> patch. Are we using the wrong flags during repack as a result?
>
> Only the `write` sub-command would ever want to set that flag, since we
> don't support writing a bitmap after `expire`. So that part seems right,
> but perhaps there is a another problem you're seeing?

In midx_repack() we'll call write_midx_internal(). That function gets
the "flags" we pass to midx_repack() and will check
MIDX_WRITE_REV_INDEX. I haven't checked whether we actually reach that,
but that's what I was wondering, i.e. whether the repack routine would
"write" when repacking, and we missed the flag option there.
Taylor Blau Sept. 9, 2021, 4:23 p.m. UTC | #8
On Thu, Sep 09, 2021 at 05:50:47PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> I may have discovered a subtle bug in the process, in
> >> cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
> >> which cares about MIDX_WRITE_REV_INDEX, but only
> >> cmd_multi_pack_index_write() will set that flag, both before & after my
> >> patch. Are we using the wrong flags during repack as a result?
> >
> > Only the `write` sub-command would ever want to set that flag, since we
> > don't support writing a bitmap after `expire`. So that part seems right,
> > but perhaps there is a another problem you're seeing?
>
> In midx_repack() we'll call write_midx_internal(). That function gets
> the "flags" we pass to midx_repack() and will check
> MIDX_WRITE_REV_INDEX. I haven't checked whether we actually reach that,
> but that's what I was wondering, i.e. whether the repack routine would
> "write" when repacking, and we missed the flag option there.

I don't think it's a problem in practice. We would never have
MIDX_WRITE_REV_INDEX set when executing cmd_multi_pack_index_repack(),
(and the same is true for all other subcommands besides `write`) because
the default value for flags is just MIDX_PROGRESS (if isatty(2)), and we
only add the WRITE_REV_INDEX bit from within the write handler.

More generally, that is to say "we only support writing a bitmap from
the `write` sub-command". There is no reason that we couldn't lift this
limitation and support writing a bitmap on the resulting MIDX after
`expire` or `repack` we just haven't done so.

But I don't see any problems with not getting the right flags, etc.

Thanks,
Taylor
Junio C Hamano Sept. 13, 2021, 12:38 a.m. UTC | #9
Taylor Blau <me@ttaylorr.com> writes:

> +static int git_multi_pack_index_write_config(const char *var, const char *value,
> +					     void *cb)
> +{
> +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> +		if (git_config_bool(var, value))
> +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +		else
> +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> +	}
> +
> +	/*
> +	 * No need to fall-back to 'git_default_config', since this was already
> +	 * called in 'cmd_multi_pack_index()'.
> +	 */

It's probably not just "No need to", but calling default_config() or
any "more generic" config this late is a wrong pattern, as the one
that was already called in the caller may have been overridden by
the command line options parser, and calling "more generic" config
callback after that would be a no-no.  So I think this should read
"we should never make a fall-back call to ...", instead.

Granted, cmd_multi_pack_index() only uses default_config(), and what
it sets will not be overridden by the command line argument parser
of the multi-pack-index command, so it is not a problem yet, but if
we ever introduce configuration variables that are applicable to
multiple subcommands of multi-pack-index, this distinction may start
to matter.

Unless a Git subcommand cannot work with just a single call to
git_config() with a callback upfront, I actually think it may be
better for them to issue a more pointed git_config_get_*() on the
variables they are interested in, never making a separate call to
git_config().

But that can be left for the future.
Taylor Blau Sept. 14, 2021, 1:15 a.m. UTC | #10
On Sun, Sep 12, 2021 at 05:38:36PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
> > +					     void *cb)
> > +{
> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> > +		if (git_config_bool(var, value))
> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> > +		else
> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> > +	}
> > +
> > +	/*
> > +	 * No need to fall-back to 'git_default_config', since this was already
> > +	 * called in 'cmd_multi_pack_index()'.
> > +	 */
>
> It's probably not just "No need to", but calling default_config() or
> any "more generic" config this late is a wrong pattern [...]

Makes sense to me, will apply. Thanks!

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 763f7af7c4..ad7f73a1ea 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -159,6 +159,10 @@  pack.writeBitmapHashCache::
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
 	bytes per object of disk space. Defaults to true.
++
+When writing a multi-pack reachability bitmap, no new namehashes are
+computed; instead, any namehashes stored in an existing bitmap are
+permuted into their appropriate location when writing a new bitmap.
 
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 73c0113b48..dd1652531b 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -61,6 +61,23 @@  static struct option *add_common_options(struct option *prev)
 	return parse_options_concat(common_opts, prev);
 }
 
+static int git_multi_pack_index_write_config(const char *var, const char *value,
+					     void *cb)
+{
+	if (!strcmp(var, "pack.writebitmaphashcache")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
+	}
+
+	/*
+	 * No need to fall-back to 'git_default_config', since this was already
+	 * called in 'cmd_multi_pack_index()'.
+	 */
+	return 0;
+}
+
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options;
@@ -73,6 +90,10 @@  static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_END(),
 	};
 
+	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+
+	git_config(git_multi_pack_index_write_config, NULL);
+
 	options = add_common_options(builtin_multi_pack_index_write_options);
 
 	trace2_cmd_mode(argv[0]);
diff --git a/midx.c b/midx.c
index ccdc3e5702..6c35dcd557 100644
--- a/midx.c
+++ b/midx.c
@@ -1008,9 +1008,13 @@  static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	struct pack_idx_entry **index;
 	struct commit **commits = NULL;
 	uint32_t i, commits_nr;
+	uint16_t options = 0;
 	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
 	int ret;
 
+	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+		options |= BITMAP_OPT_HASH_CACHE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
@@ -1049,7 +1053,7 @@  static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 		goto cleanup;
 
 	bitmap_writer_set_checksum(midx_hash);
-	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options);
 
 cleanup:
 	free(index);
diff --git a/midx.h b/midx.h
index aa3da557bb..541d9ac728 100644
--- a/midx.h
+++ b/midx.h
@@ -44,6 +44,7 @@  struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
+#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);